#
d52734d0 |
|
24-Jan-2024 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17") kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs. Because KVM CPU caps are initialized from the kernel boot CPU features this makes the XSAVES feature also unavailable for KVM guests in this case. At the same time the XSAVEC feature is left enabled. Unfortunately, having XSAVEC but no XSAVES in CPUID breaks Hyper-V enabled Windows Server 2016 VMs that have more than one vCPU. Let's at least give users hint in the kernel log what could be wrong since these VMs currently simply hang at boot with a black screen - giving no clue what suddenly broke them and how to make them work again. Trigger the kernel message hint based on the particular guest ID written to the Guest OS Identity Hyper-V MSR implemented by KVM. Defer this check to when the L1 Hyper-V hypervisor enables SVM in EFER since we want to limit this message to Hyper-V enabled Windows guests only (Windows session running nested as L2) but the actual Guest OS Identity MSR write is done by L1 and happens before it enables SVM. Fixes: b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <b83ab45c5e239e5d148b0ae7750133a67ac9575c.1706127425.git.maciej.szmigiero@oracle.com> [Move some checks before mutex_lock(), rename function. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3652117f |
|
22-Nov-2023 |
Christian Brauner <brauner@kernel.org> |
eventfd: simplify eventfd_signal() Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
d6800af5 |
|
17-Oct-2023 |
Nicolas Saenz Julienne <nsaenz@amazon.com> |
KVM: x86: hyper-v: Don't auto-enable stimer on write from user-space Don't apply the stimer's counter side effects when modifying its value from user-space, as this may trigger spurious interrupts. For example: - The stimer is configured in auto-enable mode. - The stimer's count is set and the timer enabled. - The stimer expires, an interrupt is injected. - The VM is live migrated. - The stimer config and count are deserialized, auto-enable is ON, the stimer is re-enabled. - The stimer expires right away, and injects an unwarranted interrupt. Cc: stable@vger.kernel.org Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers") Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231017155101.40677-1-nsaenz@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
765da7fe |
|
07-Aug-2023 |
Like Xu <likexu@tencent.com> |
KVM: x86: Remove break statements that will never be executed Fix compiler warnings when compiling KVM with [-Wunreachable-code-break]. No functional change intended. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230807094243.32516-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
db9cf24c |
|
12-Dec-2022 |
Vipin Sharma <vipinsh@google.com> |
KVM: x86: hyper-v: Add extended hypercall support in Hyper-v Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b describes hypercalls above call code 0x8000 as extended hypercalls. A Hyper-v hypervisor's guest VM finds availability of extended hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest can call extended hypercalls. All extended hypercalls will exit to userspace by default. This allows for easy support of future hypercalls without being dependent on KVM releases. If there will be need to process the hypercall in KVM instead of userspace then KVM can create a capability which userspace can query to know which hypercalls can be handled by the KVM and enable handling of those hypercalls. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-10-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
1a9df326 |
|
12-Dec-2022 |
Vipin Sharma <vipinsh@google.com> |
KVM: x86: hyper-v: Use common code for hypercall userspace exit Remove duplicate code to exit to userspace for hyper-v hypercalls and use a common place to exit. No functional change intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-9-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
e76ae527 |
|
24-Jan-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrs Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
8d20bd63 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Unify pr_fmt to use module name for all KVM modules Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks use consistent formatting across common x86, Intel, and AMD code. In addition to providing consistent print formatting, using KBUILD_MODNAME, e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and SGX and ...) as technologies without generating weird messages, and without causing naming conflicts with other kernel code, e.g. "SEV: ", "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems. Opportunistically move away from printk() for prints that need to be modified anyways, e.g. to drop a manual "kvm: " prefix. Opportunistically convert a few SGX WARNs that are similarly modified to WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good that they would fire repeatedly and spam the kernel log without providing unique information in each print. Note, defining pr_fmt yields undesirable results for code that uses KVM's printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's wrappers is relatively limited in KVM x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20221130230934.1014142-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2be1bd3a |
|
13-Oct-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Hyper-V invariant TSC control Normally, genuine Hyper-V doesn't expose architectural invariant TSC (CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR (HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the PV MSR is set, invariant TSC bit starts to show up in CPUID. When the feature is exposed to Hyper-V guests, reenlightenment becomes unneeded. Add the feature to KVM. Keep CPUID output intact when the feature wasn't exposed to L1 and implement the required logic for hiding invariant TSC when the feature was exposed and invariant TSC control MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to disable the feature once it was enabled. For the reference, for linux guests, support for the feature was added in commit dce7cd62754b ("x86/hyperv: Allow guests to enable InvariantTSC"). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4e5bf89f |
|
13-Oct-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Hyper-V invariant TSC control Normally, genuine Hyper-V doesn't expose architectural invariant TSC (CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR (HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the PV MSR is set, invariant TSC bit starts to show up in CPUID. When the feature is exposed to Hyper-V guests, reenlightenment becomes unneeded. Add the feature to KVM. Keep CPUID output intact when the feature wasn't exposed to L1 and implement the required logic for hiding invariant TSC when the feature was exposed and invariant TSC control MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to disable the feature once it was enabled. For the reference, for linux guests, support for the feature was added in commit dce7cd62754b ("x86/hyperv: Allow guests to enable InvariantTSC"). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8b9e13d2 |
|
08-Dec-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Fix 'using uninitialized value' Coverity warning In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM halves' and increase 'data_offset' otherwise. Coverity discovered, that in one case both variables are incremented unconditionally. This doesn't seem to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves' data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also takes into account 'hc->fast' but is still worth fixing. To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to 'struct kvm_hv_hcall' as a union and use at call sites. This allows to remove explicit 'data_offset'/'consumed_xmm_halves' parameters from kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries() helpers. Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and is not zeroed, consumers are supposed to initialize the appropriate field if needed. Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Addresses-Coverity-ID: 1527764 ("Uninitialized variables") Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221208102700.959630-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0c2a0412 |
|
02-Dec-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: remove unnecessary exports Several symbols are not used by vendor modules but still exported. Removing them ensures that new coupling between kvm.ko and kvm-*.ko is noticed and reviewed. Co-developed-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f4de6a1f |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Expose Hyper-V L2 TLB flush feature With both nSVM and nVMX implementations in place, KVM can now expose Hyper-V L2 TLB flush feature to userspace. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-30-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b415d8d4 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Make kvm_hv_get_assist_page() return 0/-errno Convert kvm_hv_get_assist_page() to return 'int' and propagate possible errors from kvm_read_guest_cached(). Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-28-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
046f5756 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv' In preparation to enabling L2 TLB flush, cache VP assist page in 'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry() to nested_get_evmptr() and make it return eVMCS GPA directly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-26-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c58a318f |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: L2 TLB flush Handle L2 TLB flush requests by going through all vCPUs and checking whether there are vCPUs running the same VM_ID with a VP_ID specified in the requests. Perform synthetic exit to L2 upon finish. Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit racy, we count on the fact that KVM flushes the whole L2 VPID upon transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon transition between L1 and L2 to make sure all pending requests are always processed. For the reference, Hyper-V TLFS refers to the feature as "Direct Virtual Flush". Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-24-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7d5e88d3 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks' To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1 may use vCPU overcommit for L2. To avoid growing on-stack allocation, make 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated dynamically. Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2 requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings, i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so can remain an on-stack allocation. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-19-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
53ca765a |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush To handle L2 TLB flush requests, KVM needs to use a separate fifo from regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush something in L2 is made, the target vCPU can transition from L2 to L1, receive a request to flush a GVA for L1 and then try to enter L2 back. The first request needs to be processed at this point. Similarly, requests to flush GVAs in L1 must wait until L2 exits to L1. No functional change as KVM doesn't handle L2 TLB flush requests from L2 yet. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-18-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b6c2c22f |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi() Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi() for a smaller number of vCPUs in the request. When Hyper-V TLB flush is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to send IPIs to a large number of vCPUs (and are rarely used in general). Introduce hv_is_vp_in_sparse_set() to directly check if the specified VP_ID is present in sparse vCPU set. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-17-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ca7372ac |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64' It may not be clear from where the '64' limit for the maximum sparse bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead. Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's definition. Opportunistically adjust the comment around BUILD_BUG_ON(). No functional change. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-16-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
aee73823 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs To handle L2 TLB flush requests, KVM needs to translate the specified L2 GPA to L1 GPA to read hypercall arguments from there. No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-14-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f84fcb66 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls Extended GVA ranges support bit seems to indicate whether lower 12 bits of GVA can be used to specify up to 4095 additional consequent GVAs to flush. This is somewhat described in TLFS. Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests by flushing the whole VPID so technically, extended GVA ranges were already supported. As such requests are handled more gently now, advertizing support for extended ranges starts making sense to reduce the size of TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
26097086 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by flushing the whole VPID and this is sub-optimal. Switch to handling these requests with 'flush_tlb_gva()' hooks instead. Use the newly introduced TLB flush fifo to queue the requests. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
56b5354f |
|
01-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: hyper-v: Add helper to read hypercall data for array Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for reading a guest-provided array can be reused in the future, e.g. for getting a list of virtual addresses whose TLB entries need to be flushed. Opportunisticaly swap the order of the data and XMM adjustment so that the XMM/gpa offsets are bundled together. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0823570f |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Introduce TLB flush fifo To allow flushing individual GVAs instead of always flushing the whole VPID a per-vCPU structure to pass the requests is needed. Use standard 'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits) and 'flush all'. The size of the fifo is arbitrarily set to '16'. Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the queue before returning -EOPNOTSUPP (which triggers full TLB flush) so the functional change is very small but the infrastructure is prepared to handle individual GVA flush requests. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
adc43caa |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag In preparation to implementing fine-grained Hyper-V TLB flush and L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH request in kvm_vcpu_flush_tlb_guest(). The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest(). No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4da77090 |
|
30-Aug-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nVMX: Support PERF_GLOBAL_CTRL with enlightened VMCS Enlightened VMCS v1 got updated and now includes the required fields for loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT features. For KVM on Hyper-V enablement, KVM can just observe VMX control MSRs and use the features (with or without eVMCS) when possible. Hyper-V on KVM is messier as Windows 11 guests fail to boot if the controls are advertised and a new PV feature flag, CPUID.0x4000000A.EBX BIT(0), is not set. Honor the Hyper-V CPUID feature flag to play nice with Windows guests. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220830133737.1539624-16-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
dea6e140 |
|
30-Aug-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID leaf to know which Enlightened VMCS definition to use (original or 2022 update). Cache the leaf along with other Hyper-V CPUID feature leaves to make the check quick. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-12-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3be29eb7 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Report error when setting CPUID if Hyper-V allocation fails Return -ENOMEM back to userspace if allocating the Hyper-V vCPU struct fails when enabling Hyper-V in guest CPUID. Silently ignoring failure means that KVM will not have an up-to-date CPUID cache if allocating the struct succeeds later on, e.g. when activating SynIC. Rejecting the CPUID operation also guarantess that vcpu->arch.hyperv is non-NULL if hyperv_enabled is true, which will allow for additional cleanup, e.g. in the eVMCS code. Note, the initialization needs to be done before CPUID is set, and more subtly before kvm_check_cpuid(), which potentially enables dynamic XFEATURES. Sadly, there's no easy way to avoid exposing Hyper-V details to CPUID or vice versa. Expose kvm_hv_vcpu_init() and the Hyper-V CPUID signature to CPUID instead of exposing cpuid_entry2_find() outside of CPUID code. It's hard to envision kvm_hv_vcpu_init() being misused, whereas cpuid_entry2_find() absolutely shouldn't be used outside of core CPUID code. Fixes: 10d7bf1e46dc ("KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-6-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1cac8d9f |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Check for existing Hyper-V vCPU in kvm_hv_vcpu_init() When potentially allocating/initializing the Hyper-V vCPU struct, check for an existing instance in kvm_hv_vcpu_init() instead of requiring callers to perform the check. Relying on callers to do the check is risky as it's all too easy for KVM to overwrite vcpu->arch.hyperv and leak memory, and it adds additional burden on callers without much benefit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20220830133737.1539624-5-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ce2196b8 |
|
30-Aug-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Zero out entire Hyper-V CPUID cache before processing entries Wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead of having to zero each particular member when the corresponding CPUID entry was not found. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [sean: split to separate patch] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20220830133737.1539624-4-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
277ad7d5 |
|
11-Jul-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add dedicated helper to get CPUID entry with significant index Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM queries for CPUID leaves whose index _may_ be significant, and drop the index param from the existing kvm_find_cpuid_entry(). Add a WARN in the inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID entry whose index is significant without explicitly providing an index. Using an explicit magic number and letting callers omit the index avoids confusion by eliminating the myriad cases where KVM specifies '0' as a dummy value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d603fd8d |
|
09-May-2022 |
Yury Norov <yury.norov@gmail.com> |
KVM: x86: hyper-v: replace bitmap_weight() with hweight64() kvm_hv_flush_tlb() applies bitmap API to a u64 variable valid_bank_mask. Since valid_bank_mask has a fixed size, we can use hweight64() and avoid excessive bloating. CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Jim Mattson <jmattson@google.com> CC: Joerg Roedel <joro@8bytes.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Sean Christopherson <seanjc@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Wanpeng Li <wanpengli@tencent.com> CC: kvm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: x86@kernel.org Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
#
a7ef9b45 |
|
19-May-2022 |
Yury Norov <yury.norov@gmail.com> |
KVM: x86: hyper-v: fix type of valid_bank_mask In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long, but is used as u64, which is wrong for i386, and has been spotted by LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with hweight64()" https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/ But it's wrong even without that patch because now bitmap_weight() dereferences a word after valid_bank_mask on i386. >> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type +[-Wshift-count-overflow] 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~ include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8' 10 | ((!!((w) & (1ULL << 0))) + \ | ^ include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16' 20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32' 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64' 29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) | ^~~~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64' 1983 | if (hc->var_cnt != hweight64(valid_bank_mask)) | ^~~~~~~~~ CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Jim Mattson <jmattson@google.com> CC: Joerg Roedel <joro@8bytes.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Sean Christopherson <seanjc@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Wanpeng Li <wanpengli@tencent.com> CC: kvm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: x86@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fe06a0c0 |
|
23-Jan-2022 |
Yury Norov <yury.norov@gmail.com> |
KVM: x86: replace bitmap_weight with bitmap_empty where appropriate In some places kvm/hyperv.c code calls bitmap_weight() to check if any bit of a given bitmap is set. It's better to use bitmap_empty() in that case because bitmap_empty() stops traversing the bitmap as soon as it finds first set bit, while bitmap_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
#
ea8c66fe |
|
19-May-2022 |
Yury Norov <yury.norov@gmail.com> |
KVM: x86: hyper-v: fix type of valid_bank_mask In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long, but is used as u64, which is wrong for i386, and has been spotted by LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with hweight64()" https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/ But it's wrong even without that patch because now bitmap_weight() dereferences a word after valid_bank_mask on i386. >> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type +[-Wshift-count-overflow] 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~ include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8' 10 | ((!!((w) & (1ULL << 0))) + \ | ^ include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16' 20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32' 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64' 29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) | ^~~~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64' 1983 | if (hc->var_cnt != hweight64(valid_bank_mask)) | ^~~~~~~~~ CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Jim Mattson <jmattson@google.com> CC: Joerg Roedel <joro@8bytes.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Sean Christopherson <seanjc@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Wanpeng Li <wanpengli@tencent.com> CC: kvm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: x86@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
42dcbe7d |
|
07-Apr-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU The following WARN is triggered from kvm_vm_ioctl_set_clock(): WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G W O 5.16.0.stable #20 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... Call Trace: <TASK> ? kvm_write_guest+0x114/0x120 [kvm] kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm] kvm_arch_vm_ioctl+0xa26/0xc50 [kvm] ? schedule+0x4e/0xc0 ? __cond_resched+0x1a/0x50 ? futex_wait+0x166/0x250 ? __send_signal+0x1f1/0x3d0 kvm_vm_ioctl+0x747/0xda0 [kvm] ... The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty() is called without an active vCPU") but the change seems to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's no real need to actually write to guest memory to invalidate TSC page, this can be done by the first vCPU which goes through kvm_guest_time_update(). Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220407201013.963226-1-vkuznets@redhat.com>
|
#
320af55a |
|
10-Mar-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add wrappers for setting/clearing APICv inhibits Add set/clear wrappers for toggling APICv inhibits to make the call sites more readable, and opportunistically rename the inner helpers to align with the new wrappers and to make them more readable as well. Invert the flag from "activate" to "set"; activate is painfully ambiguous as it's not obvious if the inhibit is being activated, or if APICv is being activated, in which case the inhibit is being deactivated. For the functions that take @set, swap the order of the inhibit reason and @set so that the call sites are visually similar to those that bounce through the wrapper. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220311043517.17027-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b1e34d32 |
|
25-Mar-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated Setting non-zero values to SYNIC/STIMER MSRs activates certain features, this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated. Note, it would've been better to forbid writing anything to SYNIC/STIMER MSRs, including zeroes, however, at least QEMU tries clearing HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat 'special' as writing zero there triggers an action, this also should not happen when SynIC wasn't activated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-4-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7ec37d1c |
|
25-Mar-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for irqchip_in_kernel() so normally SynIC irqs should never be set. It is, however, possible for a misbehaving VMM to write to SYNIC/STIMER MSRs causing erroneous behavior. The immediate issue being fixed is that kvm_irq_delivery_to_apic() (kvm_irq_delivery_to_apic_fast()) crashes when called with 'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
47d3e5cd |
|
22-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall It has been proven on practice that at least Windows Server 2019 tries using HVCALL_SEND_IPI_EX in 'XMM fast' mode when it has more than 64 vCPUs and it needs to send an IPI to a vCPU > 63. Similarly to other XMM Fast hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}{,_EX}), this information is missing in TLFS as of 6.0b. Currently, KVM returns an error (HV_STATUS_INVALID_HYPERCALL_INPUT) and Windows crashes. Note, HVCALL_SEND_IPI is a 'standard' fast hypercall (not 'XMM fast') as all its parameters fit into RDX:R8 and this is handled by KVM correctly. Cc: stable@vger.kernel.org # 5.14.x: 3244867af8c0: KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req Cc: stable@vger.kernel.org # 5.14.x Fixes: d8f5537a8816 ("KVM: hyper-v: Advertise support for fast XMM hypercalls") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7321f47e |
|
22-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls When TLB flush hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX are issued in 'XMM fast' mode, the maximum number of allowed sparse_banks is not 'HV_HYPERCALL_MAX_XMM_REGISTERS - 1' (5) but twice as many (10) as each XMM register is 128 bit long and can hold two 64 bit long banks. Cc: stable@vger.kernel.org # 5.14.x Fixes: 5974565bc26d ("KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
82c1ead0 |
|
22-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
50e523dd |
|
22-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
66c03a92 |
|
02-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Implement Enlightened MSR-Bitmap feature Similar to nVMX commit 502d2bf5f2fd ("KVM: nVMX: Implement Enlightened MSR Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM). Notable differences from nVMX implementation: - As the feature uses SW reserved fields in VMCB control, KVM needs to make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()). - 'msrpm_base_pa' needs to be always be overwritten in nested_svm_vmrun_msrpm(), even when the update is skipped. As an optimization, nested_vmcb02_prepare_control() copies it from VMCB01 so when MSR-Bitmap feature for L2 is disabled nothing needs to be done. - 'struct vmcb_ctrl_area_cached' needs to be extended with clean fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to copy it so nested_svm_vmrun_msrpm() can use it later. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ce385917 |
|
02-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Make kvm_hv_hypercall_enabled() static inline In preparation for using kvm_hv_hypercall_enabled() from SVM code, make it static inline to avoid the need to export it. The function is a simple check with only two call sites currently. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
413af660 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add checks for reserved-to-zero Hyper-V hypercall fields Add checks for the three fields in Hyper-V's hypercall params that must be zero. Per the TLFS, HV_STATUS_INVALID_HYPERCALL_INPUT is returned if "A reserved bit in the specified hypercall input value is non-zero." Note, some versions of the TLFS have an off-by-one bug for the last reserved field, and define it as being bits 64:60. See https://github.com/MicrosoftDocs/Virtualization-Documentation/pull/1682. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
40421f38 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Reject fixeds-size Hyper-V hypercalls with non-zero "var_cnt" Reject Hyper-V hypercalls if the guest specifies a non-zero variable size header (var_cnt in KVM) for a hypercall that has a fixed header size. Per the TLFS: It is illegal to specify a non-zero variable header size for a hypercall that is not explicitly documented as accepting variable sized input headers. In such a case the hypercall will result in a return code of HV_STATUS_INVALID_HYPERCALL_INPUT. Note, at least some of the various DEBUG commands likely aren't allowed to use variable size headers, but the TLFS documentation doesn't clearly state what is/isn't allowed. Omit them for now to avoid unnecessary breakage. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9c52f6b3 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Shove vp_bitmap handling down into sparse_set_to_vcpu_mask() Move the vp_bitmap "allocation" that's needed to handle mismatched vp_index values down into sparse_set_to_vcpu_mask() and drop __always_inline from said helper. The need for an intermediate vp_bitmap is a detail that's specific to the sparse translation with mismatched VP<=>vCPU indexes and does not need to be exposed to the caller. Regarding the __always_inline, prior to commit f21dd494506a ("KVM: x86: hyperv: optimize sparse VP set processing") the helper, then named hv_vcpu_in_sparse_set(), was a tiny bit of code that effectively boiled down to a handful of bit ops. The __always_inline was understandable, if not justifiable. Since the aforementioned change, sparse_set_to_vcpu_mask() is a chunky 350-450+ bytes of code without KASAN=y, and balloons to 1100+ with KASAN=y. In other words, it has no business being forcefully inlined. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
79661c37 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Don't bother reading sparse banks that end up being ignored When handling "sparse" VP_SET requests, don't read sparse banks that can't possibly contain a legal VP index instead of ignoring such banks later on in sparse_set_to_vcpu_mask(). This allows KVM to cap the size of its sparse_banks arrays for VP_SET at KVM_HV_MAX_SPARSE_VCPU_SET_BITS. Add a compile time assert that KVM_HV_MAX_SPARSE_VCPU_SET_BITS<=64, i.e. that KVM_MAX_VCPUS<=4096, as the TLFS allows for at most 64 sparse banks, and KVM will need to do _something_ to play nice with Hyper-V. Reducing the size of sparse_banks fudges around a compilation warning (that becomes error with KVM_WERROR=y) when CONFIG_KASAN_STACK=y, which is selected (and can't be unselected) by CONFIG_KASAN=y when using gcc (clang/LLVM is a stack hog in some cases so it's opt-in for clang). KASAN_STACK adds a redzone around every stack variable, which pushes the Hyper-V functions over the default limit of 1024. Ideally, KVM would flat out reject such impossibilities, but the TLFS explicitly allows providing empty banks, even if a bank can't possibly contain a valid VP index due to its position exceeding KVM's max. Furthermore, for a bit 1 in ValidBankMask, it is valid state for the corresponding element in BanksContents can be all 0s, meaning no processors are specified in this bank. Arguably KVM should reject and not ignore the "extra" banks, but that can be done independently and without bloating sparse_banks, e.g. by reading each "extra" 8-byte chunk individually. Reported-by: Ajay Garg <ajaygargnsit@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a0dd008f |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add a helper to get the sparse VP_SET for IPIs and TLB flushes Add a helper, kvm_get_sparse_vp_set(), to handle sanity checks related to the VARHEAD field and reading the sparse banks of a VP_SET. A future commit to reduce the memory footprint of sparse_banks will introduce more common code to the sparse bank retrieval. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
25af9081 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Refactor kvm_hv_flush_tlb() to reduce indentation Refactor the "extended" path of kvm_hv_flush_tlb() to reduce the nesting depth for the non-fast sparse path, and to make the code more similar to the extended path in kvm_hv_send_ipi(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bd1ba573 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Get the number of Hyper-V sparse banks from the VARHEAD field Get the number of sparse banks from the VARHEAD field, which the guest is required to provide as "The size of a variable header, in QWORDS.", where the variable header is: Variable Header Bytes = {Total Header Bytes - sizeof(Fixed Header)} rounded up to nearest multiple of 8 Variable HeaderSize = Variable Header Bytes / 8 In other words, the VARHEAD should match the number of sparse banks. Keep the manual count as a sanity check, but otherwise rely on the field so as to more closely align with the logic defined in the TLFS and to allow for future cleanups. Tweak the tracepoint output to use "rep_cnt" instead of simply "cnt" now that there is also "var_cnt". Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f1575642 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Skip APICv update if APICv is disable at the module level Bail from the APICv update paths _before_ taking apicv_update_lock if APICv is disabled at the module level. kvm_request_apicv_update() in particular is invoked from multiple paths that can be reached without APICv being enabled, e.g. svm_enable_irq_window(), and taking the rw_sem for write when APICv is disabled may introduce unnecessary contention and stalls. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
502d2bf5 |
|
29-Nov-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nVMX: Implement Enlightened MSR Bitmap feature Updating MSR bitmap for L2 is not cheap and rearly needed. TLFS for Hyper-V offers 'Enlightened MSR Bitmap' feature which allows L1 hypervisor to inform L0 when it changes MSR bitmap, this eliminates the need to examine L1's MSR bitmap for L2 every time when 'real' MSR bitmap for L2 gets constructed. Use 'vmx->nested.msr_bitmap_changed' flag to implement the feature. Note, KVM already uses 'Enlightened MSR bitmap' feature when it runs as a nested hypervisor on top of Hyper-V. The newly introduced feature is going to be used by Hyper-V guests on KVM. When the feature is enabled for Win10+WSL2, it shaves off around 700 CPU cycles from a nested vmexit cost (tight cpuid loop test). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211129094704.326635-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
46808a4c |
|
16-Nov-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3244867a |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req Do not bail early if there are no bits set in the sparse banks for a non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is legal to have a variable length of '0', e.g. VP_SET's BankContents in this case, if the request can be serviced without the extra info. It is possible that for a given invocation of a hypercall that does accept variable sized input headers that all the header input fits entirely within the fixed size header. In such cases the variable sized input header is zero-sized and the corresponding bits in the hypercall input should be set to zero. Bailing early results in KVM failing to send IPIs to all CPUs as expected by the guest. Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b5aead00 |
|
23-May-2021 |
Tom Lendacky <thomas.lendacky@amd.com> |
KVM: x86: Assume a 64-bit hypercall for guests with protected state When processing a hypercall for a guest with protected state, currently SEV-ES guests, the guest CS segment register can't be checked to determine if the guest is in 64-bit mode. For an SEV-ES guest, it is expected that communication between the guest and the hypervisor is performed to shared memory using the GHCB. In order to use the GHCB, the guest must have been in long mode, otherwise writes by the guest to the GHCB would be encrypted and not be able to be comprehended by the hypervisor. Create a new helper function, is_64_bit_hypercall(), that assumes the guest is in 64-bit mode when the guest has protected state, and returns true, otherwise invoking is_64_bit_mode() to determine the mode. Update the hypercall related routines to use is_64_bit_hypercall() instead of is_64_bit_mode(). Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to this helper function for a guest running with protected state. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
77c3323f |
|
08-Nov-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Rename kvm_lapic_enable_pv_eoi() kvm_lapic_enable_pv_eoi() is a misnomer as the function is also used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
187c8833 |
|
21-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism Use a rw_semaphore instead of a mutex to coordinate APICv updates so that vCPUs responding to requests can take the lock for read and run in parallel. Using a mutex forces serialization of vCPUs even though kvm_vcpu_update_apicv() only touches data local to that vCPU or is protected by a different lock, e.g. SVM's ir_list_lock. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211022004927.1448382-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
620b2438 |
|
03-Sep-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: Make kvm_make_vcpus_request_mask() use pre-allocated cpu_kick_mask kvm_make_vcpus_request_mask() already disables preemption so just like kvm_make_all_cpus_request_except() it can be switched to using pre-allocated per-cpu cpumasks. This allows for improvements for both users of the function: in Hyper-V emulation code 'tlb_flush' can now be dropped from 'struct kvm_vcpu_hv' and kvm_make_scan_ioapic_request_mask() gets rid of dynamic allocation. cpumask_available() checks in kvm_make_vcpu_request() and kvm_kick_many_cpus() can now be dropped as they checks for an impossible condition: kvm_init() makes sure per-cpu masks are allocated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210903075141.403071-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
381cecc5 |
|
03-Sep-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: Drop 'except' parameter from kvm_make_vcpus_request_mask() Both remaining callers of kvm_make_vcpus_request_mask() pass 'NULL' for 'except' parameter so it can just be dropped. No functional change intended ©. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210903075141.403071-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6470accc |
|
03-Sep-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Avoid calling kvm_make_vcpus_request_mask() with vcpu_mask==NULL In preparation to making kvm_make_vcpus_request_mask() use for_each_set_bit() switch kvm_hv_flush_tlb() to calling kvm_make_all_cpus_request() for 'all cpus' case. Note: kvm_make_all_cpus_request() (unlike kvm_make_vcpus_request_mask()) currently dynamically allocates cpumask on each call and this is suboptimal. Both kvm_make_all_cpus_request() and kvm_make_vcpus_request_mask() are going to be switched to using pre-allocated per-cpu masks. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210903075141.403071-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4eeef242 |
|
10-Sep-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it before it gains more users. Back when kvm_vcpu_get_idx() was added by commit 497d72d80a78 ("KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation was more than just a simple wrapper as vcpu->vcpu_idx did not exist and retrieving the index meant walking over the vCPU array to find the given vCPU. When vcpu_idx was introduced by commit 8750e72a79dd ("KVM: remember position in kvm->vcpus array"), the helper was left behind, likely to avoid extra thrash (but even then there were only two users, the original arm usage having been removed at some point in the past). No functional change intended. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210910183220.2397812-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0f250a64 |
|
10-Aug-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is, however, possible to track whether the feature was actually used by the guest and only inhibit APICv/AVIC when needed. TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let Windows know that AutoEOI feature should be avoided. While it's up to KVM userspace to set the flag, KVM can help a bit by exposing global APICv/AVIC enablement. Maxim: - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid, since this feature can be used regardless of AVIC Paolo: - use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used instead of atomic ops Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ffbe17ca |
|
09-Aug-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: remove dead initialization hv_vcpu is initialized again a dozen lines below, and at this point vcpu->arch.hyperv is not valid. Remove the initializer. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4e62aa96 |
|
30-Jul-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input TLFS states that "Availability of the XMM fast hypercall interface is indicated via the “Hypervisor Feature Identification” CPUID Leaf (0x40000003, see section 2.4.4) ... Any attempt to use this interface when the hypervisor does not indicate availability will result in a #UD fault." Implement the check for 'strict' mode (KVM_CAP_HYPERV_ENFORCE_CPUID). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f5714bbb |
|
30-Jul-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Introduce trace_kvm_hv_hypercall_done() Hypercall failures are unusual with potentially far going consequences so it would be useful to see their results when tracing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2e2f1e8d |
|
30-Jul-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Check access to hypercall before reading XMM registers In case guest doesn't have access to the particular hypercall we can avoid reading XMM registers. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210730122625.112848-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
07ffaf34 |
|
09-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Sync all PGDs on nested transition with shadow paging Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: 41fab65e7c44 ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
445caed0 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED bit Hypercalls which use extended processor masks are only available when HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED privilege bit is exposed (and 'RECOMMENDED' is rather a misnomer). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-28-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d264eb3c |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_X64_CLUSTER_IPI_RECOMMENDED bit Hyper-V partition must possess 'HV_X64_CLUSTER_IPI_RECOMMENDED' privilege ('recommended' is rather a misnomer) to issue HVCALL_SEND_IPI hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-27-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bb53ecb4 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED bit Hyper-V partition must possess 'HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED' privilege ('recommended' is rather a misnomer) to issue HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST/SPACE hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-26-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a921cf83 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_DEBUGGING privilege bit Hyper-V partition must possess 'HV_DEBUGGING' privilege to issue HVCALL_POST_DEBUG_DATA/HVCALL_RETRIEVE_DEBUG_DATA/ HVCALL_RESET_DEBUG_SESSION hypercalls. Note, when SynDBG is disabled hv_check_hypercall_access() returns 'true' (like for any other unknown hypercall) so the result will be HV_STATUS_INVALID_HYPERCALL_CODE and not HV_STATUS_ACCESS_DENIED. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-25-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a60b3c59 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_SIGNAL_EVENTS privilege bit Hyper-V partition must possess 'HV_SIGNAL_EVENTS' privilege to issue HVCALL_SIGNAL_EVENT hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-24-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4f532b7f |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_POST_MESSAGES privilege bit Hyper-V partition must possess 'HV_POST_MESSAGES' privilege to issue HVCALL_POST_MESSAGE hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-23-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
34ef7d7b |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Check access to HVCALL_NOTIFY_LONG_SPIN_WAIT hypercall TLFS6.0b states that partition issuing HVCALL_NOTIFY_LONG_SPIN_WAIT must posess 'UseHypercallForLongSpinWait' privilege but there's no corresponding feature bit. Instead, we have "Recommended number of attempts to retry a spinlock failure before notifying the hypervisor about the failures. 0xFFFFFFFF indicates never notify." Use this to check access to the hypercall. Also, check against zero as the corresponding CPUID must be set (and '0' attempts before re-try is weird anyway). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-22-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4ad81a91 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Prepare to check access to Hyper-V hypercalls Introduce hv_check_hypercallr_access() to check if the particular hypercall should be available to guest, this will be used with KVM_CAP_HYPERV_ENFORCE_CPUID mode. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-21-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1aa8a418 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_STIMER_DIRECT_MODE_AVAILABLE privilege bit Synthetic timers can only be configured in 'direct' mode when HV_STIMER_DIRECT_MODE_AVAILABLE bit was exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-20-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d66bfa36 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Inverse the default in hv_check_msr_access() Access to all MSRs is now properly checked. To avoid 'forgetting' to properly check access to new MSRs in the future change the default to 'false' meaning 'no access'. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-19-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
17b6d517 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_FEATURE_DEBUG_MSRS_AVAILABLE privilege bit Synthetic debugging MSRs (HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS, HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER, HV_X64_MSR_SYNDBG_PENDING_BUFFER, HV_X64_MSR_SYNDBG_OPTIONS) are only available to guest when HV_FEATURE_DEBUG_MSRS_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-18-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0a19c899 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE privilege bit HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL are only available to guest when HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-17-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
234d01ba |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_ACCESS_REENLIGHTENMENT privilege bit HV_X64_MSR_REENLIGHTENMENT_CONTROL/HV_X64_MSR_TSC_EMULATION_CONTROL/ HV_X64_MSR_TSC_EMULATION_STATUS are only available to guest when HV_ACCESS_REENLIGHTENMENT bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-16-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9442f3bd |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_ACCESS_FREQUENCY_MSRS privilege bit HV_X64_MSR_TSC_FREQUENCY/HV_X64_MSR_APIC_FREQUENCY are only available to guest when HV_ACCESS_FREQUENCY_MSRS bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-15-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
978b5747 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_APIC_ACCESS_AVAILABLE privilege bit HV_X64_MSR_EOI, HV_X64_MSR_ICR, HV_X64_MSR_TPR, and HV_X64_MSR_VP_ASSIST_PAGE are only available to guest when HV_MSR_APIC_ACCESS_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-14-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
eba60dda |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_SYNTIMER_AVAILABLE privilege bit Synthetic timers MSRs (HV_X64_MSR_STIMER[0-3]_CONFIG, HV_X64_MSR_STIMER[0-3]_COUNT) are only available to guest when HV_MSR_SYNTIMER_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9e2715ca |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_SYNIC_AVAILABLE privilege bit SynIC MSRs (HV_X64_MSR_SCONTROL, HV_X64_MSR_SVERSION, HV_X64_MSR_SIEFP, HV_X64_MSR_SIMP, HV_X64_MSR_EOM, HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15) are only available to guest when HV_MSR_SYNIC_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a1ec661c |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_REFERENCE_TSC_AVAILABLE privilege bit HV_X64_MSR_REFERENCE_TSC is only available to guest when HV_MSR_REFERENCE_TSC_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
679008e4 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_RESET_AVAILABLE privilege bit HV_X64_MSR_RESET is only available to guest when HV_MSR_RESET_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d2ac25d4 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_VP_INDEX_AVAILABLE privilege bit HV_X64_MSR_VP_INDEX is only available to guest when HV_MSR_VP_INDEX_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c2b32867 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_TIME_REF_COUNT_AVAILABLE privilege bit HV_X64_MSR_TIME_REF_COUNT is only available to guest when HV_MSR_TIME_REF_COUNT_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b80a92ff |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_VP_RUNTIME_AVAILABLE privilege bit HV_X64_MSR_VP_RUNTIME is only available to guest when HV_MSR_VP_RUNTIME_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1561c2cb |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Honor HV_MSR_HYPERCALL_AVAILABLE privilege bit HV_X64_MSR_GUEST_OS_ID/HV_X64_MSR_HYPERCALL are only available to guest when HV_MSR_HYPERCALL_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b4128000 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Prepare to check access to Hyper-V MSRs Introduce hv_check_msr_access() to check if the particular MSR should be accessible by guest, this will be used with KVM_CAP_HYPERV_ENFORCE_CPUID mode. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
10d7bf1e |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability Limiting exposed Hyper-V features requires a fast way to check if the particular feature is exposed in guest visible CPUIDs or not. To aboid looping through all CPUID entries on every hypercall/MSR access cache the required leaves on CPUID update. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
644f7067 |
|
21-May-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows for limiting Hyper-V features to those exposed to the guest in Hyper-V CPUIDs (0x40000003, 0x40000004, ...). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d8f5537a |
|
26-May-2021 |
Siddharth Chandrasekaran <sidcha@amazon.de> |
KVM: hyper-v: Advertise support for fast XMM hypercalls Now that kvm_hv_flush_tlb() has been patched to support XMM hypercall inputs, we can start advertising this feature to guests. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <e63fc1c61dd2efecbefef239f4f0a598bd552750.1622019134.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5974565b |
|
26-May-2021 |
Siddharth Chandrasekaran <sidcha@amazon.de> |
KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers Hyper-V supports the use of XMM registers to perform fast hypercalls. This allows guests to take advantage of the improved performance of the fast hypercall interface even though a hypercall may require more than (the current maximum of) two input registers. The XMM fast hypercall interface uses six additional XMM registers (XMM0 to XMM5) to allow the guest to pass an input parameter block of up to 112 bytes. Add framework to read from XMM registers in kvm_hv_hypercall() and use the additional hypercall inputs from XMM registers in kvm_hv_flush_tlb() when possible. Cc: Alexander Graf <graf@amazon.com> Co-developed-by: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <fc62edad33f1920fe5c74dde47d7d0b4275a9012.1622019134.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bd38b320 |
|
26-May-2021 |
Siddharth Chandrasekaran <sidcha@amazon.de> |
KVM: hyper-v: Collect hypercall params into struct As of now there are 7 parameters (and flags) that are used in various hyper-v hypercall handlers. There are 6 more input/output parameters passed from XMM registers which are to be added in an upcoming patch. To make passing arguments to the handlers more readable, capture all these parameters into a single structure. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <273f7ed510a1f6ba177e61b73a5c7bfbee4a4a87.1622019133.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
da6d63a0 |
|
18-May-2021 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots() WARNING: suspicious RCU usage 5.13.0-rc1 #4 Not tainted ----------------------------- ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by hyperv_clock/8318: #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm] stack backtrace: CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4 Call Trace: dump_stack+0x87/0xb7 lockdep_rcu_suspicious+0xce/0xf0 kvm_write_guest_page+0x1c1/0x1d0 [kvm] kvm_write_guest+0x50/0x90 [kvm] kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm] kvm_gen_update_masterclock+0x1d/0x110 [kvm] kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm] kvm_vm_ioctl+0x123/0x11d0 [kvm] __x64_sys_ioctl+0x3ed/0x9d0 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock. Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs) Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-4-git-send-email-wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0469f2f7 |
|
16-Mar-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment When guest opts for re-enlightenment notifications upon migration, it is in its right to assume that TSC page values never change (as they're only supposed to change upon migration and the host has to keep things as they are before it receives confirmation from the guest). This is mostly true until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift apart (even slightly), the update causes TSC page values to change. The issue at hand is that when Hyper-V is migrated, it uses stale (cached) TSC page values to compute the difference between its own clocksource (provided by KVM) and its guests' TSC pages to program synthetic timers and in some cases, when TSC page is updated, this puts all stimer expirations in the past. This, in its turn, causes an interrupt storm and L2 guests not making much forward progress. Note, KVM doesn't fully implement re-enlightenment notification. Basically, the support for reenlightenment MSRs is just a stub and userspace is only expected to expose the feature when TSC scaling on the expected destination hosts is available. With TSC scaling, no real re-enlightenment is needed as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it likely makes little sense to fully implement re-enlightenment in KVM. Prevent TSC page from being updated after migration. In case it's not the guest who's initiating the change and when TSC page is already enabled, just keep it as it is: TSC value is supposed to be preserved across migration and TSC frequency can't change with re-enlightenment enabled. The guest is doomed anyway if any of this is not true. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cc9cfddb |
|
16-Mar-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Track Hyper-V TSC page status Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it was updated from guest/host side or if we've failed to set it up (because e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no need to retry. Also, in a hypothetical situation when we are in 'always catchup' mode for TSC we can now avoid contending 'hv->hv_lock' on every guest enter by setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters() returns false. Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in get_time_ref_counter() to properly handle the situation when we failed to write the updated TSC page values to the guest. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e880c6ea |
|
16-Mar-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration) we need to make sure no vCPU sees stale values in PV clock structures and thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page clocksource is global and kvm_guest_time_update() only updates in on vCPU0 but this is not entirely correct: nothing blocks some other vCPU from entering the guest before we finish the update on CPU0 and it can read stale values from the page. Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d2547cf5 |
|
16-Mar-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated after migration (to accommodate for a different host TSC frequency when TSC scaling is not supported; we don't implement this in KVM). Guest can use the same MSR to stop TSC access emulation by writing zero. Writing anything else is forbidden. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210316143736.964151-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
919f4ebc |
|
26-Feb-2021 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref Reported by syzkaller: KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147] CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0 RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline] RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline] RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498 Call Trace: kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223 kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed or SynIC is enabled. However, the syzkaller testcase sets irq routing table directly w/o enabling SynIC. This results in null-ptr-deref when accessing SynIC Hyper-V context. This patch fixes it. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000 Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com Fixes: 8f014550dfb1 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional") Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fc08b628 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Allocate Hyper-V context lazily Hyper-V context is only needed for guests which use Hyper-V emulation in KVM (e.g. Windows/Hyper-V guests) so we don't actually need to allocate it in kvm_arch_vcpu_create(), we can postpone the action until Hyper-V specific MSRs are accessed or SynIC is enabled. Once allocated, let's keep the context alive for the lifetime of the vCPU as an attempt to free it would require additional synchronization with other vCPUs and normally it is not supposed to happen. Note, Hyper-V style hypercall enablement is done by writing to HV_X64_MSR_GUEST_OS_ID so we don't need to worry about allocating Hyper-V context from kvm_hv_hypercall(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-15-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8f014550 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional Hyper-V emulation is enabled in KVM unconditionally. This is bad at least from security standpoint as it is an extra attack surface. Ideally, there should be a per-VM capability explicitly enabled by VMM but currently it is not the case and we can't mandate one without breaking backwards compatibility. We can, however, check guest visible CPUIDs and only enable Hyper-V emulation when "Hv#1" interface was exposed in HYPERV_CPUID_INTERFACE. Note, VMMs are free to act in any sequence they like, e.g. they can try to set MSRs first and CPUIDs later so we still need to allow the host to read/write Hyper-V specific MSRs unconditionally. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-14-vkuznets@redhat.com> [Add selftest vcpu_set_hv_cpuid API to avoid breaking xen_vmcall_test. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4592b7ea |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Allocate 'struct kvm_vcpu_hv' dynamically Hyper-V context is only needed for guests which use Hyper-V emulation in KVM (e.g. Windows/Hyper-V guests). 'struct kvm_vcpu_hv' is, however, quite big, it accounts for more than 1/4 of the total 'struct kvm_vcpu_arch' which is also quite big already. This all looks like a waste. Allocate 'struct kvm_vcpu_hv' dynamically. This patch does not bring any (intentional) functional change as we still allocate the context unconditionally but it paves the way to doing that only when needed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f2bc14b6 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context Currently, Hyper-V context is part of 'struct kvm_vcpu_arch' and is always available. As a preparation to allocating it dynamically, check that it is not NULL at call sites which can normally proceed without it i.e. the behavior is identical to the situation when Hyper-V emulation is not being used by the guest. When Hyper-V context for a particular vCPU is not allocated, we may still need to get 'vp_index' from there. E.g. in a hypothetical situation when Hyper-V emulation was enabled on one CPU and wasn't on another, Hyper-V style send-IPI hypercall may still be used. Luckily, vp_index is always initialized to kvm_vcpu_get_idx() and can only be changed when Hyper-V context is present. Introduce kvm_hv_get_vpindex() helper for simplification. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9ff5e030 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct kvm_vcpu_hv' As a preparation to allocating Hyper-V context dynamically, make it clear who's the user of the said context. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
72167a9d |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Stop shadowing global 'current_vcpu' variable 'current_vcpu' variable in KVM is a per-cpu pointer to the currently scheduled vcpu. kvm_hv_flush_tlb()/kvm_hv_send_ipi() functions used to have local 'vcpu' variable to iterate over vCPUs but it's gone now and there's no need to use anything but the standard 'vcpu' as an argument. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
05f04ae4 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Introduce to_kvm_hv() helper Spelling '&kvm->arch.hyperv' correctly is hard. Also, this makes the code more consistent with vmx/svm where to_kvm_vmx()/to_kvm_svm() are already being used. Opportunistically change kvm_hv_msr_{get,set}_crash_{data,ctl}() and kvm_hv_msr_set_crash_data() to take 'kvm' instead of 'vcpu' as these MSRs are partition wide. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f69b55ef |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Rename vcpu_to_hv_syndbg() to to_hv_syndbg() vcpu_to_hv_syndbg()'s argument is always 'vcpu' so there's no need to have an additional prefix. Also, this makes the code more consistent with vmx/svm where to_vmx()/to_svm() are being used. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
aafa97fd |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Rename vcpu_to_stimer()/stimer_to_vcpu() vcpu_to_stimers()'s argument is almost always 'vcpu' so there's no need to have an additional prefix. Also, this makes the naming more consistent with to_hv_vcpu()/to_hv_synic(). Rename stimer_to_vcpu() to hv_stimer_to_vcpu() for consitency. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e0121fa2 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Rename vcpu_to_synic()/synic_to_vcpu() vcpu_to_synic()'s argument is almost always 'vcpu' so there's no need to have an additional prefix. Also, as this is used outside of hyper-v emulation code, add '_hv_' part to make it clear what this s. This makes the naming more consistent with to_hv_vcpu(). Rename synic_to_vcpu() to hv_synic_to_vcpu() for consistency. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ef3f3980 |
|
26-Jan-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: Rename vcpu_to_hv_vcpu() to to_hv_vcpu() vcpu_to_hv_vcpu()'s argument is almost always 'vcpu' so there's no need to have an additional prefix. Also, this makes the code more consistent with vmx/svm where to_vmx()/to_svm() are being used. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
79033beb |
|
13-Jun-2018 |
Joao Martins <joao.m.martins@oracle.com> |
KVM: x86/xen: Fix coexistence of Xen and Hyper-V hypercalls Disambiguate Xen vs. Hyper-V calls by adding 'orl $0x80000000, %eax' at the start of the Hyper-V hypercall page when Xen hypercalls are also enabled. That bit is reserved in the Hyper-V ABI, and those hypercall numbers will never be used by Xen (because it does precisely the same trick). Switch to using kvm_vcpu_write_guest() while we're at it, instead of open-coding it. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
#
b3646477 |
|
14-Jan-2021 |
Jason Baron <jbaron@akamai.com> |
KVM: x86: use static calls to reduce kvm_x86_ops overhead Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are covered here except for 'pmu_ops and 'nested ops'. Here are some numbers running cpuid in a loop of 1 million calls averaged over 5 runs, measured in the vm (lower is better). Intel Xeon 3000MHz: |default |mitigations=off ------------------------------------- vanilla |.671s |.486s static call|.573s(-15%)|.458s(-6%) AMD EPYC 2500MHz: |default |mitigations=off ------------------------------------- vanilla |.710s |.609s static call|.664s(-6%) |.609s(0%) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c21d54f0 |
|
29-Sep-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl KVM_GET_SUPPORTED_HV_CPUID is a vCPU ioctl but its output is now independent from vCPU and in some cases VMMs may want to use it as a system ioctl instead. In particular, QEMU doesn CPU feature expansion before any vCPU gets created so KVM_GET_SUPPORTED_HV_CPUID can't be used. Convert KVM_GET_SUPPORTED_HV_CPUID to 'dual' system/vCPU ioctl with the same meaning. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200929150944.1235688-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
dbcf3f96 |
|
24-Sep-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: disallow configuring SynIC timers with no SynIC Hyper-V Synthetic timers require SynIC but we don't seem to check that upon HV_X64_MSR_STIMER[X]_CONFIG/HV_X64_MSR_STIMER0_COUNT writes. Make the behavior match synic_set_msr(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200924145757.1035782-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e1471463 |
|
26-Sep-2020 |
Joseph Salisbury <joseph.salisbury@microsoft.com> |
x86/hyperv: Remove aliases with X64 in their name In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
#
dfc53baa |
|
26-Sep-2020 |
Joseph Salisbury <joseph.salisbury@microsoft.com> |
x86/hyperv: Remove aliases with X64 in their name In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
99b48ecc |
|
17-Jul-2020 |
Jon Doron <arilou@gmail.com> |
x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled Based on an analysis of the HyperV firmwares (Gen1 and Gen2) it seems like the SCONTROL is not being set to the ENABLED state as like we have thought. Also from a test done by Vitaly Kuznetsov, running a nested HyperV it was concluded that the first access to the SCONTROL MSR with a read resulted with the value of 0x1, aka HV_SYNIC_CONTROL_ENABLE. It's important to note that this diverges from the value states in the HyperV TLFS of 0. Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200717125238.1103096-2-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9eb41c52 |
|
18-Feb-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b187038b |
|
29-May-2020 |
Jon Doron <arilou@gmail.com> |
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls There is another mode for the synthetic debugger which uses hypercalls to send/recv network data instead of the MSR interface. This interface is much slower and less recommended since you might get a lot of VMExits while KDVM polling for new packets to recv, rather than simply checking the pending page to see if there is data avialble and then request. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-6-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
45c38973 |
|
29-May-2020 |
Jon Doron <arilou@gmail.com> |
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page Microsoft's kdvm.dll dbgtransport module does not respect the hypercall page and simply identifies the CPU being used (AMD/Intel) and according to it simply makes hypercalls with the relevant instruction (vmmcall/vmcall respectively). The relevant function in kdvm is KdHvConnectHypervisor which first checks if the hypercall page has been enabled via HV_X64_MSR_HYPERCALL_ENABLE, and in case it was not it simply sets the HV_X64_MSR_GUEST_OS_ID to 0x1000101010001 which means: build_number = 0x0001 service_version = 0x01 minor_version = 0x01 major_version = 0x01 os_id = 0x00 (Undefined) vendor_id = 1 (Microsoft) os_type = 0 (A value of 0 indicates a proprietary, closed source OS) and starts issuing the hypercall without setting the hypercall page. To resolve this issue simply enable hypercalls also if the guest_os_id is not 0. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-5-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f97f5a56 |
|
29-May-2020 |
Jon Doron <arilou@gmail.com> |
x86/kvm/hyper-v: Add support for synthetic debugger interface Add support for Hyper-V synthetic debugger (syndbg) interface. The syndbg interface is using MSRs to emulate a way to send/recv packets data. The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled and if it supports the synthetic debugger interface it will attempt to use it, instead of trying to initialize a network adapter. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-4-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7357b1df |
|
22-Apr-2020 |
Michael Kelley <mikelley@microsoft.com> |
KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page The Hyper-V Reference TSC Page structure is defined twice. struct ms_hyperv_tsc_page has padding out to a full 4 Kbyte page size. But the padding is not needed because the declaration includes a union with HV_HYP_PAGE_SIZE. KVM uses the second definition, which is struct _HV_REFERENCE_TSC_PAGE, because it does not have the padding. Fix the duplication by removing the padding from ms_hyperv_tsc_page. Fix up the KVM code to use it. Remove the no longer used struct _HV_REFERENCE_TSC_PAGE. There is no functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20200422195737.10223-2-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
#
54163a34 |
|
06-May-2020 |
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> |
KVM: Introduce kvm_make_all_cpus_request_except() This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
33b22172 |
|
17-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: move nested-related kvm_x86_ops to a separate struct Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0baedd79 |
|
24-Mar-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest() Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest() (just like KVM PV TLB flush mechanism) instead. Introduce KVM_REQ_HV_TLB_FLUSH to support the change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
afaf0b2f |
|
21-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f4fdc0a2 |
|
14-Nov-2019 |
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> |
kvm: x86: hyperv: Use APICv update request interface Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
86187937 |
|
11-Dec-2019 |
Marios Pomonis <pomonis@google.com> |
KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF attacks This fixes Spectre-v1/L1TF vulnerabilities in kvm_hv_msr_get_crash_data() and kvm_hv_msr_set_crash_data(). These functions contain index computations that use the (attacker-controlled) MSR number. Fixes: e7d9513b60e8 ("kvm/x86: added hyper-v crash msrs into kvm hyperv context") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2f9f5cdd |
|
10-Dec-2019 |
Miaohe Lin <linmiaohe@huawei.com> |
KVM: hyperv: Fix some typos in vcpu unimpl info Fix some typos in vcpu unimpl info. It should be unhandled rather than uhandled. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
67b0ae43 |
|
10-Dec-2019 |
Miaohe Lin <linmiaohe@huawei.com> |
KVM: Fix some comment typos and missing parentheses Fix some typos and add missing parentheses in the comments. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
59508b30 |
|
04-Dec-2019 |
Peter Xu <peterx@redhat.com> |
KVM: X86: Move irrelevant declarations out of ioapic.h kvm_apic_match_dest() is declared in both ioapic.h and lapic.h. Remove the declaration in ioapic.h. kvm_apic_compare_prio() is declared in ioapic.h but defined in lapic.c. Move the declaration to lapic.h. kvm_irq_delivery_to_apic() is declared in ioapic.h but defined in irq_comm.c. Move the declaration to irq.h. hyperv.c needs to use kvm_irq_delivery_to_apic(). Include irq.h in hyperv.c. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2d8b167 |
|
16-Sep-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot guarantee that two virtual processors won't end up running on sibling SMT threads without knowing about it. This is done as an optimization as in this case there is nothing the guest can do to protect itself against MDS and issuing additional flush requests is just pointless. On bare metal the topology is known, however, when Hyper-V is running nested (e.g. on top of KVM) it needs an additional piece of information: a confirmation that the exposed topology (wrt vCPU placement on different SMT threads) is trustworthy. NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in TLFS as follows: "Indicates that a virtual processor will never share a physical core with another virtual processor, except for virtual processors that are reported as sibling SMT threads." From KVM we can give such guarantee in two cases: - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work as it can become re-enabled during the lifetime of the guest). - vCPUs are properly pinned so the scheduler won't put them on sibling SMT threads (when they're not reported as such). This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the first case. The second case is outside of KVM's domain of responsibility (as vCPU pinning is actually done by someone who manages KVM's userspace - e.g. libvirt pinning QEMU threads). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a073d7e3 |
|
16-Sep-2019 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: hyperv: Fix Direct Synthetic timers assert an interrupt w/o lapic_in_kernel Reported by syzkaller: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029 Call Trace: kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558 stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline] stimer_expiration arch/x86/kvm/hyperv.c:659 [inline] kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686 vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896 vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152 kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360 kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765 The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT, in addition, there is no lapic in the kernel, the counters value are small enough in order that kvm_hv_process_stimers() inject this already-expired timer interrupt into the guest through lapic in the kernel which triggers the NULL deferencing. This patch fixes it by don't advertise direct mode synthetic timers and discarding the inject when lapic is not in kernel. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=1752fe0a600000 Reported-by: syzbot+dff25ee91f0c7d5c1695@syzkaller.appspotmail.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ea152987 |
|
27-Aug-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled If kvm_intel is loaded with nested=0 parameter an attempt to perform KVM_GET_SUPPORTED_HV_CPUID results in OOPS as nested_get_evmcs_version hook in kvm_x86_ops is NULL (we assign it in nested_vmx_hardware_setup() and this only happens in case nested is enabled). Check that kvm_x86_ops->nested_get_evmcs_version is not NULL before calling it. With this, we can remove the stub from svm as it is no longer needed. Cc: <stable@vger.kernel.org> Fixes: e2e871ab2f02 ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
f4e4805e |
|
12-Jul-2019 |
Arnd Bergmann <arnd@arndb.de> |
x86: kvm: avoid -Wsometimes-uninitized warning Clang notices a code path in which some variables are never initialized, but fails to figure out that this can never happen on i386 because is_64_bit_mode() always returns false. arch/x86/kvm/hyperv.c:1610:6: error: variable 'ingpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (!longmode) { ^~~~~~~~~ arch/x86/kvm/hyperv.c:1632:55: note: uninitialized use occurs here trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); ^~~~~ arch/x86/kvm/hyperv.c:1610:2: note: remove the 'if' if its condition is always true if (!longmode) { ^~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1595:18: note: initialize the variable 'ingpa' to silence this warning u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS; ^ = 0 arch/x86/kvm/hyperv.c:1610:6: error: variable 'outgpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] arch/x86/kvm/hyperv.c:1610:6: error: variable 'param' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] Flip the condition around to avoid the conditional execution on i386. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
20c8ccb1 |
|
04-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 see the copying file in the top level directory extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 35 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
de3cd117 |
|
30-Apr-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Omit caching logic for always-available GPRs Except for RSP and RIP, which are held in VMX's VMCS, GPRs are always treated "available and dirtly" on both VMX and SVM, i.e. are unconditionally loaded/saved immediately before/after VM-Enter/VM-Exit. Eliminating the unnecessary caching code reduces the size of KVM by a non-trivial amount, much of which comes from the most common code paths. E.g. on x86_64, kvm_emulate_cpuid() is reduced from 342 to 182 bytes and kvm_emulate_hypercall() from 1362 to 1143, with the total size of KVM dropping by ~1000 bytes. With CONFIG_RETPOLINE=y, the numbers are even more pronounced, e.g.: 353->182, 1418->1172 and well over 2000 bytes. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
da66761c |
|
20-Mar-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 It was reported that with some special Multi Processor Group configuration, e.g: bcdedit.exe /set groupsize 1 bcdedit.exe /set maxgroup on bcdedit.exe /set groupaware on for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism is in use. Tracing kvm_hv_flush_tlb immediately reveals the issue: kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2 The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES, however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified. We don't flush anything and apparently it's not what Windows expects. TLFS doesn't say anything about such requests and newer Windows versions seem to be unaffected. This all feels like a WS2012 bug, which is, however, easy to workaround in KVM: let's flush everything when we see an empty flush request, over-flushing doesn't hurt. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
013cc6eb |
|
13-Mar-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init When userspace initializes guest vCPUs it may want to zero all supported MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/ HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") we began doing stimer_mark_pending() unconditionally on every config change. The issue I'm observing manifests itself as following: - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER; - kvm_hv_has_stimer_pending() starts returning true; - kvm_vcpu_has_events() starts returning true; - kvm_arch_vcpu_runnable() starts returning true; - when kvm_arch_vcpu_ioctl_run() gets into (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case: - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns immediately, avoiding normal wait path; - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing userspace to retry. So instead of normal wait path we get a busy loop on all secondary vCPUs before they get INIT signal. This seems to be undesirable, especially given that this happens even when Hyper-V extensions are not used. Generally, it seems to be pointless to mark an stimer as pending in stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing kvm_hv_process_stimers() will do is clear the corresponding bit. We may just not mark disabled timers as pending instead. Fixes: f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
254272ce |
|
11-Feb-2019 |
Ben Gardon <bgardon@google.com> |
kvm: x86: Add memcg accounting to KVM allocations There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. If the allocations aren't tied to the process, the OOM killer will not know that killing the process will free the associated kernel memory. Add __GFP_ACCOUNT flags to many of the allocations which are not yet being charged to the VM process's cgroup. Tested: Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch introduced no new failures. Ran a kernel memory accounting test which creates a VM to touch memory and then checks that the kernel memory allocated for the process is within certain bounds. With this patch we account for much more of the vmalloc and slab memory allocated for the VM. There remain a few allocations which should be charged to the VM's cgroup but are not. In x86, they include: vcpu->arch.pio_data There allocations are unaccounted in this patch because they are mapped to userspace, and accounting them to a cgroup causes problems. This should be addressed in a future patch. Signed-off-by: Ben Gardon <bgardon@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2869f28 |
|
24-Jan-2019 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
KVM: x86: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f1adceaf |
|
24-Jan-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: recommend using eVMCS only when it is enabled We shouldn't probably be suggesting using Enlightened VMCS when it's not enabled (not supported from guest's point of view). Hyper-V on KVM seems to be fine either way but let's be consistent. Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1998fd32 |
|
24-Jan-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR System reset through synthetic MSR is not recommended neither by genuine Hyper-V nor my QEMU. Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9699f970 |
|
24-Jan-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: don't announce GUEST IDLE MSR support HV_X64_MSR_GUEST_IDLE_AVAILABLE appeared in kvm_vcpu_ioctl_get_hv_cpuid() by mistake: it announces support for HV_X64_MSR_GUEST_IDLE (0x400000F0) which we don't support in KVM (yet). Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
87a8d795 |
|
05-Dec-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/hyper-v: Stop caring about EOI for direct stimers Turns out we over-engineered Direct Mode for stimers a bit: unlike traditional stimers where we may want to try to re-inject the message upon EOI, Direct Mode stimers just set the irq in APIC and kvm_apic_set_irq() fails only when APIC is disabled (see APIC_DM_FIXED case in __apic_accept_irq()). Remove the redundant part. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
08a800ac |
|
26-Nov-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: avoid open-coding stimer_mark_pending() in kvm_hv_notify_acked_sint() stimers_pending optimization only helps us to avoid multiple kvm_make_request() calls. This doesn't happen very often and these calls are very cheap in the first place, remove open-coded version of stimer_mark_pending() from kvm_hv_notify_acked_sint(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8644f771 |
|
26-Nov-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: direct mode for synthetic timers Turns out Hyper-V on KVM (as of 2016) will only use synthetic timers if direct mode is available. With direct mode we notify the guest by asserting APIC irq instead of sending a SynIC message. The implementation uses existing vec_bitmap for letting lapic code know that we're interested in the particular IRQ's EOI request. We assume that the same APIC irq won't be used by the guest for both direct mode stimer and as sint source (especially with AutoEOI semantics). It is unclear how things should be handled if that's not true. Direct mode is also somewhat less expensive; in my testing stimer_send_msg() takes not less than 1500 cpu cycles and stimer_notify_direct() can usually be done in 300-400. WS2016 without Hyper-V, however, always sticks to non-direct version. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6a058a1e |
|
26-Nov-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: use stimer config definition from hyperv-tlfs.h As a preparation to implementing Direct Mode for Hyper-V synthetic timers switch to using stimer config definition from hyperv-tlfs.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2bc39970 |
|
10-Dec-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a4987def |
|
10-Dec-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/hyper-v: Do some housekeeping in hyperv-tlfs.h hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted, it's hard to get which CPUID they belong to, some items are duplicated (e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY). Do some housekeeping work. While on it, replace all (1 << X) with BIT(X) macro. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7deec5e0 |
|
10-Dec-2018 |
Roman Kagan <rkagan@virtuozzo.com> |
x86: kvm: hyperv: don't retry message delivery for periodic timers The SynIC message delivery protocol allows the message originator to request, should the message slot be busy, to be notified when it's free. However, this is unnecessary and even undesirable for messages generated by SynIC timers in periodic mode: if the period is short enough compared to the time the guest spends in the timer interrupt handler, so the timer ticks start piling up, the excessive interactions due to this notification and retried message delivery only makes the things worse. [This was observed, in particular, with Windows L2 guests setting (temporarily) the periodic timer to 2 kHz, and spending hundreds of microseconds in the timer interrupt handler due to several L2->L1 exits; under some load in L0 this could exceed 500 us so the timer ticks started to pile up and the guest livelocked.] Relieve the situation somewhat by not retrying message delivery for periodic SynIC timers. This appears to remain within the "lazy" lost ticks policy for SynIC timers as implemented in KVM. Note that it doesn't solve the fundamental problem of livelocking the guest with a periodic timer whose period is smaller than the time needed to process a tick, but it makes it a bit less likely to be triggered. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3a0e7731 |
|
10-Dec-2018 |
Roman Kagan <rkagan@virtuozzo.com> |
x86: kvm: hyperv: simplify SynIC message delivery SynIC message delivery is somewhat overengineered: it pretends to follow the ordering rules when grabbing the message slot, using atomic operations and all that, but does it incorrectly and unnecessarily. The correct order would be to first set .msg_pending, then atomically replace .message_type if it was zero, and then clear .msg_pending if the previous step was successful. But this all is done in vcpu context so the whole update looks atomic to the guest (it's assumed to only access the message page from this cpu), and therefore can be done in whatever order is most convenient (and is also the reason why the incorrect order didn't trigger any bugs so far). While at this, also switch to kvm_vcpu_{read,write}_guest_page, and drop the no longer needed synic_clear_sint_msg_pending. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
12e0c618 |
|
16-Oct-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyperv: don't clear VP assist pages on init VP assist pages may hold valuable data which needs to be preserved across migration. Clean PV EOI portion of the data on init, the guest is responsible for making sure there's no garbage in the rest. This will be used for nVMX migration, eVMCS address needs to be preserved. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
72bbf935 |
|
16-Oct-2018 |
Ladi Prosek <lprosek@redhat.com> |
KVM: hyperv: define VP assist page helpers The state related to the VP assist page is still managed by the LAPIC code in the pv_eoi field. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f21dd494 |
|
10-Oct-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: optimize sparse VP set processing Rewrite kvm_hv_flush_tlb()/send_ipi_vcpus_mask() making them cleaner and somewhat more optimal. hv_vcpu_in_sparse_set() is converted to sparse_set_to_vcpu_mask() which copies sparse banks u64-at-a-time and then, depending on the num_mismatched_vp_indexes value, returns immediately or does vp index to vcpu index conversion by walking all vCPUs. To support the change and make kvm_hv_send_ipi() look similar to kvm_hv_flush_tlb() send_ipi_vcpus_mask() is introduced. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e6b6c483 |
|
08-Oct-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: fix 'tlb_lush' typo Regardless of whether your TLB is lush or not it still needs flushing. Reported-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
214ff83d |
|
26-Sep-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: implement PV IPI send hypercalls Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2cefc5fe |
|
26-Sep-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: optimize kvm_hv_flush_tlb() for vp_index == vcpu_idx case VP inedx almost always matches VCPU and when it does it's faster to walk the sparse set instead of all vcpus. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0b0a31ba |
|
26-Sep-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: valid_bank_mask should be 'u64' This probably doesn't matter much (KVM_MAX_VCPUS is much lower nowadays) but valid_bank_mask is really u64 and not unsigned long. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
87ee613d |
|
26-Sep-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: keep track of mismatched VP indexes In most common cases VP index of a vcpu matches its vcpu index. Userspace is, however, free to set any mapping it wishes and we need to account for that when we need to find a vCPU with a particular VP index. To keep search algorithms optimal in both cases introduce 'num_mismatched_vp_indexes' counter showing how many vCPUs with mismatching VP index we have. In case the counter is zero we can assume vp_index == vcpu_idx. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1779a39f |
|
26-Sep-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables Rename 'hv' to 'hv_vcpu' in kvm_hv_set_msr/kvm_hv_get_msr(); 'hv' is 'reserved' for 'struct kvm_hv' variables across the file. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a812297c |
|
21-Aug-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: optimize 'all cpus' case in kvm_hv_flush_tlb() We can use 'NULL' to represent 'all cpus' case in kvm_make_vcpus_request_mask() and avoid building vCPU mask with all vCPUs. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9170200e |
|
21-Aug-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS Hyper-V TLFS (5.0b) states: > Virtual processors are identified by using an index (VP index). The > maximum number of virtual processors per partition supported by the > current implementation of the hypervisor can be obtained through CPUID > leaf 0x40000005. A virtual processor index must be less than the > maximum number of virtual processors per partition. Forbid userspace to set VP_INDEX above KVM_MAX_VCPUS. get_vcpu_by_vpidx() can now be optimized to bail early when supplied vpidx is >= KVM_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
44883f01 |
|
26-Jul-2018 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you they are only accepted under special conditions. This makes the API a pain to use. To avoid this pain, this patch makes it so that the result of the get-list ioctl can always be used for host-initiated get and set. Since we don't have a separate way to check for read-only MSRs, this means some Hyper-V MSRs are ignored when written. Arguably they should not even be in the result of GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding Hyper-V feature. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c7012676 |
|
16-May-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation Implement HvFlushVirtualAddress{List,Space}Ex hypercalls in the same way we've implemented non-EX counterparts. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Initialized valid_bank_mask to silence misguided GCC warnigs. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
e2f11f42 |
|
16-May-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way: do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently IN_GUEST_MODE. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
56b9ae78 |
|
16-May-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: do rep check for each hypercall separately Prepare to support TLB flush hypercalls, some of which are REP hypercalls. Also, return HV_STATUS_INVALID_HYPERCALL_INPUT as it seems more appropriate. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
142c95da |
|
16-May-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: hyperv: use defines when parsing hypercall parameters Avoid open-coding offsets for hypercall input parameters, we already have defines for them. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
696ca779 |
|
24-May-2018 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: fix #UD address of failed Hyper-V hypercalls If the hypercall was called from userspace or real mode, KVM injects #UD and then advances RIP, so it looks like #UD was caused by the following instruction. This probably won't cause more than confusion, but could give an unexpected access to guest OS' instruction emulator. Also, refactor the code to count hv hypercalls that were handled by the virt userspace. Fixes: 6356ee0c9602 ("x86: Delay skip of emulated hypercall instruction") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
452a68d0 |
|
07-May-2018 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: hyperv: idr_find needs RCU protection Even though the eventfd is released after the KVM SRCU grace period elapses, the conn_to_evt data structure itself is not; it uses RCU internally, instead. Fix the read-side critical section to happen under rcu_read_lock/unlock; the result is still protected by vcpu->kvm->srcu. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6356ee0c |
|
29-Apr-2018 |
Marian Rotariu <mrotariu@bitdefender.com> |
x86: Delay skip of emulated hypercall instruction The IP increment should be done after the hypercall emulation, after calling the various handlers. In this way, these handlers can accurately identify the the IP of the VMCALL if they need it. This patch keeps the same functionality for the Hyper-V handler which does not use the return code of the standard kvm_skip_emulated_instruction() call. Signed-off-by: Marian Rotariu <mrotariu@bitdefender.com> [Hyper-V hypercalls also need kvm_skip_emulated_instruction() - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d4abc577 |
|
20-Mar-2018 |
Ladi Prosek <lprosek@redhat.com> |
x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE The assist page has been used only for the paravirtual EOI so far, hence the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's called "Virtual VP Assist MSR". Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
d32ef547 |
|
17-Mar-2018 |
Dan Carpenter <dan.carpenter@oracle.com> |
kvm: x86: hyperv: delete dead code in kvm_hv_hypercall() "rep_done" is always zero so the "(((u64)rep_done & 0xfff) << 32)" expression is just zero. We can remove the "res" temporary variable as well and just use "ret" directly. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
915e6f78 |
|
01-Mar-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: inject #GP only when invalid SINTx vector is unmasked Hyper-V 2016 on KVM with SynIC enabled doesn't boot with the following trace: kvm_entry: vcpu 0 kvm_exit: reason MSR_WRITE rip 0xfffff8000131c1e5 info 0 0 kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000090 data 0x10000 host 0 kvm_msr: msr_write 40000090 = 0x10000 (#GP) kvm_inj_exception: #GP (0x0) KVM acts according to the following statement from TLFS: " 11.8.4 SINTx Registers ... Valid values for vector are 16-255 inclusive. Specifying an invalid vector number results in #GP. " However, I checked and genuine Hyper-V doesn't #GP when we write 0x10000 to SINTx. I checked with Microsoft and they confirmed that if either the Masked bit (bit 16) or the Polling bit (bit 18) is set to 1, then they ignore the value of Vector. Make KVM act accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
98f65ad4 |
|
01-Mar-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: remove stale entries from vec_bitmap/auto_eoi_bitmap on vector change When a new vector is written to SINx we update vec_bitmap/auto_eoi_bitmap but we forget to remove old vector from these masks (in case it is not present in some other SINTx). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
a2e164e7 |
|
01-Mar-2018 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm/hyper-v: add reenlightenment MSRs support Nested Hyper-V/Windows guest running on top of KVM will use TSC page clocksource in two cases: - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]). - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]). Exposing invariant TSC effectively blocks migration to hosts with different TSC frequencies, providing reenlightenment support will be needed when we start migrating nested workloads. Implement rudimentary support for reenlightenment MSRs. For now, these are just read/write MSRs with no effect. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
faeb7833 |
|
01-Feb-2018 |
Roman Kagan <rkagan@virtuozzo.com> |
kvm: x86: hyperv: guest->host event signaling via eventfd In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
cbc0236a |
|
01-Feb-2018 |
Roman Kagan <rkagan@virtuozzo.com> |
kvm: x86: factor out kvm.arch.hyperv (de)init Move kvm.arch.hyperv initialization and cleanup to separate functions. For now only a mutex is inited in the former, and the latter is empty; more stuff will go in there in a followup patch. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
de63ad4c |
|
07-Aug-2017 |
Longpeng(Mike) <longpeng2@huawei.com> |
KVM: X86: implement the logic for spinlock optimization get_cpl requires vcpu_load, so we must cache the result (whether the vcpu was preempted when its cpl=0) in kvm_vcpu_arch. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
199b5763 |
|
07-Aug-2017 |
Longpeng(Mike) <longpeng2@huawei.com> |
KVM: add spinlock optimization framework If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
72c139ba |
|
26-Jul-2017 |
Ladi Prosek <lprosek@redhat.com> |
KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCY It has been experimentally confirmed that supporting these two MSRs is one of the necessary conditions for nested Hyper-V to use the TSC page. Modern Windows guests are noticeably slower when they fall back to reading timestamps from the HV_X64_MSR_TIME_REF_COUNT MSR instead of using the TSC page. The newly supported MSRs are advertised with the AccessFrequencyRegs partition privilege flag and CPUID.40000003H:EDX[8] "Support for determining timer frequencies is available" (both outside of the scope of this KVM patch). Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f1ff89ec |
|
20-Jul-2017 |
Roman Kagan <rkagan@virtuozzo.com> |
kvm: x86: hyperv: avoid livelock in oneshot SynIC timers If the SynIC timer message delivery fails due to SINT message slot being busy, there's no point to attempt starting the timer again until we're notified of the slot being released by the guest (via EOM or EOI). Even worse, when a oneshot timer fails to deliver its message, its re-arming with an expiration time in the past leads to immediate retry of the delivery, and so on, without ever letting the guest vcpu to run and release the slot, which results in a livelock. To avoid that, only start the timer when there's no timer message pending delivery. When there is, meaning the slot is busy, the processing will be restarted upon notification from the guest that the slot is released. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
d3457c87 |
|
14-Jul-2017 |
Roman Kagan <rkagan@virtuozzo.com> |
kvm: x86: hyperv: make VP_INDEX managed by userspace Hyper-V identifies vCPUs by Virtual Processor Index, which can be queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a sequential number which can't exceed the maximum number of vCPUs per VM. APIC ids can be sparse and thus aren't a valid replacement for VP indices. Current KVM uses its internal vcpu index as VP_INDEX. However, to make it predictable and persistent across VM migrations, the userspace has to control the value of VP_INDEX. This patch achieves that, by storing vp_index explicitly on vcpu, and allowing HV_X64_MSR_VP_INDEX to be set from the host side. For compatibility it's initialized to KVM vcpu index. Also a few variables are renamed to make clear distinction betweed this Hyper-V vp_index and KVM vcpu_id (== APIC id). Besides, a new capability, KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip attempting msr writes where unsupported, to avoid spamming error logs. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
efc479e6 |
|
22-Jun-2017 |
Roman Kagan <rkagan@virtuozzo.com> |
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 There is a flaw in the Hyper-V SynIC implementation in KVM: when message page or event flags page is enabled by setting the corresponding msr, KVM zeroes it out. This is problematic because on migration the corresponding MSRs are loaded on the destination, so the content of those pages is lost. This went unnoticed so far because the only user of those pages was in-KVM hyperv synic timers, which could continue working despite that zeroing. Newer QEMU uses those pages for Hyper-V VMBus implementation, and zeroing them breaks the migration. Besides, in newer QEMU the content of those pages is fully managed by QEMU, so zeroing them is undesirable even when writing the MSRs from the guest side. To support this new scheme, introduce a new capability, KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic pages aren't zeroed out in KVM. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
32ef5517 |
|
05-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5613fda9 |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Convert task/group cputime to nsecs Now that most cputime readers use the transition API which return the task cputime in old style cputime_t, we can safely store the cputime in nsecs. This will eventually make cputime statistics less opaque and more granular. Back and forth convertions between cputime_t and nsecs in order to deal with cputime_t random granularity won't be needed anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f98a3efb |
|
15-Dec-2016 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: use delivery to self in hyperv synic Interrupt to self can be sent without knowing the APIC ID. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3f5ad8be |
|
12-Dec-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: hyperv: fix locking of struct kvm_hv fields Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and vcpu->mutex. Protect accesses in kvm_hv_setup_tsc_page too, as suggested by Roman. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ecd8a8c2 |
|
06-Nov-2016 |
Jiang Biao <jiang.biao2@zte.com.cn> |
kvm: x86: hyperv: make function static to avoid compiling warning synic_set_irq is only used in hyperv.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
095cf55d |
|
07-Feb-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: Hyper-V tsc page setup Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Peter Hornyack <peterhornyack@google.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> [Computation of TSC page parameters rewritten to use the Linux timekeeper parameters. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
108b249c |
|
01-Sep-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce get_kvmclock_ns Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is *not* a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a2b5c3c0 |
|
29-Mar-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled If SynIC is disabled, there is nothing that userspace can do to handle these exits; on the other hand, userspace probably will not know about KVM_EXIT_HYPERV_HCALL and complain about it or even exit. Just prevent anything bad from happening by handling the hypercall in KVM and returning an "invalid hypercall" code. Fixes: 83326e43f27e9a8a501427a0060f8af519a39bb2 Cc: Andrey Smetanin <irqlevel@gmail.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
83326e43 |
|
11-Feb-2016 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V VMBus hypercall userspace exit The patch implements KVM_EXIT_HYPERV userspace exit functionality for Hyper-V VMBus hypercalls: HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT. Changes v3: * use vcpu->arch.complete_userspace_io to setup hypercall result Changes v2: * use KVM_EXIT_HYPERV for hypercalls Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2fdc257 |
|
11-Feb-2016 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Reject Hyper-V hypercall continuation Currently we do not support Hyper-V hypercall continuation so reject it. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0d9c055e |
|
11-Feb-2016 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Pass return code of kvm_emulate_hypercall Pass the return code from kvm_emulate_hypercall on to the caller, in order to allow it to indicate to the userspace that the hypercall has to be handled there. Also adjust all the existing code paths to return 1 to make sure the hypercall isn't passed to the userspace without setting kvm_run appropriately. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8ed6d767 |
|
11-Feb-2016 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Rename Hyper-V long spin wait hypercall Rename HV_X64_HV_NOTIFY_LONG_SPIN_WAIT by HVCALL_NOTIFY_LONG_SPIN_WAIT, so the name is more consistent with the other hypercalls. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org [Change name, Andrey used HV_X64_HCALL_NOTIFY_LONG_SPIN_WAIT. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ac3e5fca |
|
23-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V SynIC timers tracepoints Trace the following Hyper SynIC timers events: * periodic timer start * one-shot timer start * timer callback * timer expiration and message delivery result * timer config setup * timer count setup * timer cleanup Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
18659a9c |
|
23-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V SynIC tracepoints Trace the following Hyper SynIC events: * set msr * set sint irq * ack sint * sint irq eoi Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f3b138c5 |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Update SynIC timers on guest entry only Consolidate updating the Hyper-V SynIC timers in a single place: on guest entry in processing KVM_REQ_HV_STIMER request. This simplifies the overall logic, and makes sure the most current state of msrs and guest clock is used for arming the timers (to achieve that, KVM_REQ_HV_STIMER has to be processed after KVM_REQ_CLOCK_UPDATE). Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7be58a64 |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Skip SynIC vector check for QEMU side QEMU zero-inits Hyper-V SynIC vectors. We should allow that, and don't reject zero values if set by the host. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
23a3b201 |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V fix SynIC timer disabling condition Hypervisor Function Specification(HFS) doesn't require to disable SynIC timer at timer config write if timer->count = 0. So drop this check, this allow to load timers MSR's during migration restore, because config are set before count in QEMU side. Also fix condition according to HFS doc(15.3.1): "It is not permitted to set the SINTx field to zero for an enabled timer. If attempted, the timer will be marked disabled (that is, bit 0 cleared) immediately." Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0cdeabb1 |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Reorg stimer_expiration() to better control timer restart Split stimer_expiration() into two parts - timer expiration message sending and timer restart/cleanup based on timer state(config). This also fixes a bug where a one-shot timer message whose delivery failed once would get lost for good. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f808495d |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V unify stimer_start() and stimer_restart() This will be used in future to start Hyper-V SynIC timer in several places by one logic in one function. Changes v2: * drop stimer->count == 0 check inside stimer_start() * comment stimer_start() assumptions Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
019b9781 |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Drop stimer_stop() function The function stimer_stop() is called in one place so remove the function and replace it's call by function content. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1ac1b65a |
|
28-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V timers fix incorrect logical operation Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
481d2bcc |
|
14-Dec-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Remove Hyper-V SynIC timer stopping It's possible that guest send us Hyper-V EOM at the middle of Hyper-V SynIC timer running, so we start processing of Hyper-V SynIC timers in vcpu context and stop the Hyper-V SynIC timer unconditionally: host guest ------------------------------------------------------------------------------ start periodic stimer start periodic timer timer expires after 15ms send expiration message into guest restart periodic timer timer expires again after 15 ms msg slot is still not cleared so setup ->msg_pending (1) restart periodic timer process timer msg and clear slot ->msg_pending was set: send EOM into host received EOM kvm_make_request(KVM_REQ_HV_STIMER) kvm_hv_process_stimers(): ... stimer_stop() if (time_now >= stimer->exp_time) stimer_expiration(stimer); Because the timer was rearmed at (1), time_now < stimer->exp_time and stimer_expiration is not called. The timer then never fires. The patch fixes such situation by not stopping Hyper-V SynIC timer at all, because it's safe to restart it without stop in vcpu context and timer callback always returns HRTIMER_NORESTART. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1f4b34f8 |
|
30-Nov-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V SynIC timers Per Hyper-V specification (and as required by Hyper-V-aware guests), SynIC provides 4 per-vCPU timers. Each timer is programmed via a pair of MSRs, and signals expiration by delivering a special format message to the configured SynIC message slot and triggering the corresponding synthetic interrupt. Note: as implemented by this patch, all periodic timers are "lazy" (i.e. if the vCPU wasn't scheduled for more than the timer period the timer events are lost), regardless of the corresponding configuration MSR. If deemed necessary, the "catch up" mode (the timer period is shortened until the timer catches up) will be implemented later. Changes v2: * Use remainder to calculate periodic timer expiration time Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
765eaa0f |
|
30-Nov-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack The SynIC message protocol mandates that the message slot is claimed by atomically setting message type to something other than HVMSG_NONE. If another message is to be delivered while the slot is still busy, message pending flag is asserted to indicate to the guest that the hypervisor wants to be notified when the slot is released. To make sure the protocol works regardless of where the message sources are (kernel or userspace), clear the pending flag on SINT ACK notification, and let the message sources compete for the slot again. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
93bf4172 |
|
30-Nov-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V internal helper to read MSR HV_X64_MSR_TIME_REF_COUNT This helper will be used also in Hyper-V SynIC timers implementation. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
db397571 |
|
10-Nov-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V kvm exit A new vcpu exit is introduced to notify the userspace of the changes in Hyper-V SynIC configuration triggered by guest writing to the corresponding MSRs. Changes v4: * exit into userspace only if guest writes into SynIC MSR's Changes v3: * added KVM_EXIT_HYPERV types and structs notes into docs Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5c919412 |
|
10-Nov-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V synthetic interrupt controller SynIC (synthetic interrupt controller) is a lapic extension, which is controlled via MSRs and maintains for each vCPU - 16 synthetic interrupt "lines" (SINT's); each can be configured to trigger a specific interrupt vector optionally with auto-EOI semantics - a message page in the guest memory with 16 256-byte per-SINT message slots - an event flag page in the guest memory with 16 2048-bit per-SINT event flag areas The host triggers a SINT whenever it delivers a new message to the corresponding slot or flips an event flag bit in the corresponding area. The guest informs the host that it can try delivering a message by explicitly asserting EOI in lapic or writing to End-Of-Message (EOM) MSR. The userspace (qemu) triggers interrupts and receives EOM notifications via irqfd with resampler; for that, a GSI is allocated for each configured SINT, and irq_routing api is extended to support GSI-SINT mapping. Changes v4: * added activation of SynIC by vcpu KVM_ENABLE_CAP * added per SynIC active flag * added deactivation of APICv upon SynIC activation Changes v3: * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into docs Changes v2: * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors * add Hyper-V SynIC vectors into EOI exit bitmap * Hyper-V SyniIC SINT msr write logic simplified Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9eec50b8 |
|
15-Sep-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support HV_X64_MSR_VP_RUNTIME msr used by guest to get "the time the virtual processor consumes running guest code, and the time the associated logical processor spends running hypervisor code on behalf of that guest." Calculation of this time is performed by task_cputime_adjusted() for vcpu task. Necessary to support loading of winhv.sys in guest, which in turn is required to support Windows VMBus. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e516cebb |
|
15-Sep-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: Hyper-V HV_X64_MSR_RESET msr HV_X64_MSR_RESET msr is used by Hyper-V based Windows guest to reset guest VM by hypervisor. Necessary to support loading of winhv.sys in guest, which in turn is required to support Windows VMBus. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e7d9513b |
|
03-Jul-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: added hyper-v crash msrs into kvm hyperv context Added kvm Hyper-V context hv crash variables as storage of Hyper-V crash msrs. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e83d5887 |
|
03-Jul-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|