#
451a7078 |
|
27-Feb-2024 |
David Woodhouse <dwmw@amazon.co.uk> |
KVM: x86/xen: improve accuracy of Xen timers A test program such as http://david.woodhou.se/timerlat.c confirms user reports that timers are increasingly inaccurate as the lifetime of a guest increases. Reporting the actual delay observed when asking for 100µs of sleep, it starts off OK on a newly-launched guest but gets worse over time, giving incorrect sleep times: root@ip-10-0-193-21:~# ./timerlat -c -n 5 00000000 latency 103243/100000 (3.2430%) 00000001 latency 103243/100000 (3.2430%) 00000002 latency 103242/100000 (3.2420%) 00000003 latency 103245/100000 (3.2450%) 00000004 latency 103245/100000 (3.2450%) The biggest problem is that get_kvmclock_ns() returns inaccurate values when the guest TSC is scaled. The guest sees a TSC value scaled from the host TSC by a mul/shift conversion (hopefully done in hardware). The guest then converts that guest TSC value into nanoseconds using the mul/shift conversion given to it by the KVM pvclock information. But get_kvmclock_ns() performs only a single conversion directly from host TSC to nanoseconds, giving a different result. A test program at http://david.woodhou.se/tsdrift.c demonstrates the cumulative error over a day. It's non-trivial to fix get_kvmclock_ns(), although I'll come back to that. The actual guest hv_clock is per-CPU, and *theoretically* each vCPU could be running at a *different* frequency. But this patch is needed anyway because... The other issue with Xen timers was that the code would snapshot the host CLOCK_MONOTONIC at some point in time, and then... after a few interrupts may have occurred, some preemption perhaps... would also read the guest's kvmclock. Then it would proceed under the false assumption that those two happened at the *same* time. Any time which *actually* elapsed between reading the two clocks was introduced as inaccuracies in the time at which the timer fired. Fix it to use a variant of kvm_get_time_and_clockread(), which reads the host TSC just *once*, then use the returned TSC value to calculate the kvmclock (making sure to do that the way the guest would instead of making the same mistake get_kvmclock_ns() does). Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen timers still have to use CLOCK_MONOTONIC. In practice the difference between the two won't matter over the timescales involved, as the *absolute* values don't matter; just the delta. This does mean a new variant of kvm_get_time_and_clockread() is needed; called kvm_get_monotonic_and_clockread() because that's what it does. Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20240227115648.3104-2-dwmw2@infradead.org [sean: massage moved comment, tweak if statement formatting] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
812d4323 |
|
05-Dec-2023 |
Like Xu <likexu@tencent.com> |
KVM: x86/pmu: Explicitly check NMI from guest to reducee false positives Explicitly check that the source of external interrupt is indeed an NMI in kvm_arch_pmi_in_guest(), which reduces perf-kvm false positive samples (host samples labelled as guest samples) generated by perf/core NMI mode if an NMI arrives after VM-Exit, but before kvm_after_interrupt(): # test: perf-record + cpu-cycles:HP (which collects host-only precise samples) # Symbol Overhead sys usr guest sys guest usr # ....................................... ........ ........ ........ ......... ......... # # Before: [g] entry_SYSCALL_64 24.63% 0.00% 0.00% 24.63% 0.00% [g] syscall_return_via_sysret 23.23% 0.00% 0.00% 23.23% 0.00% [g] files_lookup_fd_raw 6.35% 0.00% 0.00% 6.35% 0.00% # After: [k] perf_adjust_freq_unthr_context 57.23% 57.23% 0.00% 0.00% 0.00% [k] __vmx_vcpu_run 4.09% 4.09% 0.00% 0.00% 0.00% [k] vmx_update_host_rsp 3.17% 3.17% 0.00% 0.00% 0.00% In the above case, perf records the samples labelled '[g]', the RIPs behind the weird samples are actually being queried by perf_instruction_pointer() after determining whether it's in GUEST state or not, and here's the issue: If VM-Exit is caused by a non-NMI interrupt (such as hrtimer_interrupt) and at least one PMU counter is enabled on host, the kvm_arch_pmi_in_guest() will remain true (KVM_HANDLING_IRQ is set) until kvm_before_interrupt(). During this window, if a PMI occurs on host (since the KVM instructions on host are being executed), the control flow, with the help of the host NMI context, will be transferred to perf/core to generate performance samples, thus perf_instruction_pointer() and perf_guest_get_ip() is called. Since kvm_arch_pmi_in_guest() only checks if there is an interrupt, it may cause perf/core to mistakenly assume that the source RIP of the host NMI belongs to the guest world and use perf_guest_get_ip() to get the RIP of a vCPU that has already exited by a non-NMI interrupt. Error samples are recorded and presented to the end-user via perf-report. Such false positive samples could be eliminated by explicitly determining if the exit reason is KVM_HANDLING_NMI. Note that when VM-exit is indeed triggered by PMI and before HANDLING_NMI is cleared, it's also still possible that another PMI is generated on host. Also for perf/core timer mode, the false positives are still possible since those non-NMI sources of interrupts are not always being used by perf/core. For events that are host-only, perf/core can and should eliminate false positives by checking event->attr.exclude_guest, i.e. events that are configured to exclude KVM guests should never fire in the guest. Events that are configured to count host and guest are trickier, perhaps impossible to handle with 100% accuracy? And regardless of what accuracy is provided by perf/core, improving KVM's accuracy is cheap and easy, with no real downsides. Fixes: dd60d217062f ("KVM: x86: Fix perf timer mode IP reporting") Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20231206032054.55070-1-likexu@tencent.com [sean: massage changelog, squash !!in_nmi() fixup from Like] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
93d1c9f4 |
|
13-Sep-2023 |
Robert Hoo <robert.hu@linux.intel.com> |
KVM: x86: Virtualize LAM for supervisor pointer Add support to allow guests to set the new CR4 control bit for LAM and add implementation to get untagged address for supervisor pointers. LAM modifies the canonicality check applied to 64-bit linear addresses for data accesses, allowing software to use of the untranslated address bits for metadata and masks the metadata bits before using them as linear addresses to access memory. LAM uses CR4.LAM_SUP (bit 28) to configure and enable LAM for supervisor pointers. It also changes VMENTER to allow the bit to be set in VMCS's HOST_CR4 and GUEST_CR4 to support virtualization. Note CR4.LAM_SUP is allowed to be set even not in 64-bit mode, but it will not take effect since LAM only applies to 64-bit linear addresses. Move CR4.LAM_SUP out of CR4_RESERVED_BITS, its reservation depends on vcpu supporting LAM or not. Leave it intercepted to prevent guest from setting the bit if LAM is not exposed to guest as well as to avoid vmread every time when KVM fetches its value, with the expectation that guest won't toggle the bit frequently. Set CR4.LAM_SUP bit in the emulated IA32_VMX_CR4_FIXED1 MSR for guests to allow guests to enable LAM for supervisor pointers in nested VMX operation. Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, KVM doesn't need to emulate TLB flush based on it. There's no other features or vmx_exec_controls connection, and no other code needed in {kvm,vmx}_set_cr4(). Skip address untag for instruction fetches (which includes branch targets), operand of INVLPG instructions, and implicit system accesses, all of which are not subject to untagging. Note, get_untagged_addr() isn't invoked for implicit system accesses as there is no reason to do so, but check the flag anyways for documentation purposes. Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-11-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
5d6d6a7d |
|
05-Oct-2023 |
David Woodhouse <dwmw@amazon.co.uk> |
KVM: x86: Refine calculation of guest wall clock to use a single TSC read When populating the guest's PV wall clock information, KVM currently does a simple 'kvm_get_real_ns() - get_kvmclock_ns(kvm)'. This is an antipattern which should be avoided; when working with the relationship between two clocks, it's never correct to obtain one of them "now" and then the other at a slightly different "now" after an unspecified period of preemption (which might not even be under the control of the kernel, if this is an L1 hosting an L2 guest under nested virtualization). Add a kvm_get_wall_clock_epoch() function to return the guest wall clock epoch in nanoseconds using the same method as __get_kvmclock() — by using kvm_get_walltime_and_clockread() to calculate both the wall clock and KVM clock time from a *single* TSC reading. The condition using get_cpu_tsc_khz() is equivalent to the version in __get_kvmclock() which separately checks for the CONSTANT_TSC feature or the per-CPU cpu_tsc_khz. Which is what get_cpu_tsc_khz() does anyway. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/bfc6d3d7cfb88c47481eabbf5a30a264c58c7789.camel@infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
a2fd5d02 |
|
06-Jun-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Snapshot host's MSR_IA32_ARCH_CAPABILITIES Snapshot the host's MSR_IA32_ARCH_CAPABILITIES, if it's supported, instead of reading the MSR every time KVM wants to query the host state, e.g. when initializing the default value during vCPU creation. The paths that query ARCH_CAPABILITIES aren't particularly performance sensitive, but creating vCPUs is a frequent enough operation that burning 8 bytes is a good trade-off. Alternatively, KVM could add a field in kvm_caps and thus skip the on-demand calculations entirely, but a pure snapshot isn't possible due to the way KVM handles the l1tf_vmx_mitigation module param. And unlike the other "supported" fields in kvm_caps, KVM doesn't enforce the "supported" value, i.e. KVM treats ARCH_CAPABILITIES like a CPUID leaf and lets userspace advertise whatever it wants. Those problems are solvable, but it's not clear there is real benefit versus snapshotting the host value, and grabbing the host value will allow additional cleanup of KVM's FB_CLEAR_CTRL code. Link: https://lore.kernel.org/all/20230524061634.54141-2-chao.gao@intel.com Cc: Chao Gao <chao.gao@intel.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230607004311.1420507-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
3a5f4907 |
|
11-May-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Make kvm_mtrr_valid() static now that there are no external users Make kvm_mtrr_valid() local to mtrr.c now that it's not used to check the validity of a PAT MSR value. Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230511233351.635053-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
55cd57b5 |
|
04-Apr-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Filter out XTILE_CFG if XTILE_DATA isn't permitted Filter out XTILE_CFG from the supported XCR0 reported to userspace if the current process doesn't have access to XTILE_DATA. Attempting to set XTILE_CFG in XCR0 will #GP if XTILE_DATA is also not set, and so keeping XTILE_CFG as supported results in explosions if userspace feeds KVM_GET_SUPPORTED_CPUID back into KVM and the guest doesn't sanity check CPUID. Fixes: 445ecdf79be0 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID") Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
6be3ae45 |
|
04-Apr-2023 |
Aaron Lewis <aaronlewis@google.com> |
KVM: x86: Add a helper to handle filtering of unpermitted XCR0 features Add a helper, kvm_get_filtered_xcr0(), to dedup code that needs to account for XCR0 features that require explicit opt-in on a per-process basis. In addition to documenting when KVM should/shouldn't consult xstate_get_guest_group_perm(), the helper will also allow sanitizing the filtered XCR0 to avoid enumerating architecturally illegal XCR0 values, e.g. XTILE_CFG without XTILE_DATA. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> [sean: rename helper, move to x86.h, massage changelog] Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
5757f5b9 |
|
10-Mar-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add macros to track first...last VMX feature MSRs Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
fb3146b4 |
|
10-Mar-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add a helper to query whether or not a vCPU has ever run Add a helper to query if a vCPU has run so that KVM doesn't have to open code the check on last_vmentry_cpu being set to a magic value. No functional change intended. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
68f7c82a |
|
21-Mar-2023 |
Binbin Wu <binbin.wu@linux.intel.com> |
KVM: x86: Change return type of is_long_mode() to bool Change return type of is_long_mode() to bool to avoid implicit cast, as literally every user of is_long_mode() treats its return value as a boolean. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-5-binbin.wu@linux.intel.com Reviewed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
bede6eb4 |
|
21-Mar-2023 |
Binbin Wu <binbin.wu@linux.intel.com> |
KVM: x86: Use boolean return value for is_{pae,pse,paging}() Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return bools. Returning an "int" requires not one, but two implicit casts, first from "unsigned long" to "int", and then again to a "bool". Both casts are more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit kernels _if_ the bits in question weren't in the lower 32 bits, and the int=>bool cast can result in false negatives/positives, e.g. see commit 0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in svm_set_cr0()"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
607475cf |
|
21-Mar-2023 |
Binbin Wu <binbin.wu@linux.intel.com> |
KVM: x86: Add helpers to query individual CR0/CR4 bits Add helpers to check if a specific CR0/CR4 bit is set to avoid a plethora of implicit casts from the "unsigned long" return of kvm_read_cr*_bits(), and to make each caller's intent more obvious. Defer converting helpers that do truly ugly casts from "unsigned long" to "int", e.g. is_pse(), to a future commit so that their conversion is more isolated. Opportunistically drop the superfluous pcid_enabled from kvm_set_cr3(); the local variable is used only once, immediately after its declaration. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-2-binbin.wu@linux.intel.com [sean: move "obvious" conversions to this commit, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
e76ae527 |
|
24-Jan-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrs Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
11df586d |
|
12-Dec-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Handle NMI VM-Exits in noinstr region Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that the NMI is handled prior to leaving the safety of noinstr. Handling the NMI after leaving noinstr exposes the kernel to potential ordering problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc. will unblock NMIs when IRETing back to the faulting instruction. Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
bec46859 |
|
05-Oct-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Track supported PERF_CAPABILITIES in kvm_caps Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing the supported capabilities on the fly every time. Using kvm_caps will also allow for future cleanups as the kvm_caps values can be used directly in common x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Like Xu <likexu@tencent.com> Message-Id: <20221006000314.73240-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1b7a1b78 |
|
20-Sep-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed() so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose the helper so that it can be used by kvm_vcpu_has_events() to determine whether or not an INIT or SIPI is pending _and_ can be taken immediately. Opportunistically replaced usage of the "latch" terminology with "blocked" and/or "allowed", again to align with KVM's terminology used for all other event types. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7055fb11 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions Treat pending TRIPLE_FAULTS as pending exceptions. A triple fault is an exception for all intents and purposes, it's just not tracked as such because there's no vector associated the exception. E.g. if userspace were to set vcpu->request_interrupt_window while running L2 and L2 hit a triple fault, a triple fault nested VM-Exit should be synthesized to L1 before exiting to userspace with KVM_EXIT_IRQ_WINDOW_OPEN. Link: https://lore.kernel.org/all/YoVHAIGcFgJit1qp@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-23-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7709aba8 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time Morph pending exceptions to pending VM-Exits (due to interception) when the exception is queued instead of waiting until nested events are checked at VM-Entry. This fixes a longstanding bug where KVM fails to handle an exception that occurs during delivery of a previous exception, KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow paging), and KVM determines that the exception is in the guest's domain, i.e. queues the new exception for L2. Deferring the interception check causes KVM to esclate various combinations of injected+pending exceptions to double fault (#DF) without consulting L1's interception desires, and ends up injecting a spurious #DF into L2. KVM has fudged around the issue for #PF by special casing emulated #PF injection for shadow paging, but the underlying issue is not unique to shadow paging in L0, e.g. if KVM is intercepting #PF because the guest has a smaller maxphyaddr and L1 (but not L0) is using shadow paging. Other exceptions are affected as well, e.g. if KVM is intercepting #GP for one of SVM's workaround or for the VMware backdoor emulation stuff. The other cases have gone unnoticed because the #DF is spurious if and only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1 would have injected #DF anyways. The hack-a-fix has also led to ugly code, e.g. bailing from the emulator if #PF injection forced a nested VM-Exit and the emulator finds itself back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves the async #PF in L2 mess; no need to set a magic flag and token, simply queue a #PF nested VM-Exit. Deal with event migration by flagging that a pending exception was queued by userspace and check for interception at the next KVM_RUN, e.g. so that KVM does the right thing regardless of the order in which userspace restores nested state vs. event state. When "getting" events from userspace, simply drop any pending excpetion that is destined to be intercepted if there is also an injected exception to be migrated. Ideally, KVM would migrate both events, but that would require new ABI, and practically speaking losing the event is unlikely to be noticed, let alone fatal. The injected exception is captured, RIP still points at the original faulting instruction, etc... So either the injection on the target will trigger the same intercepted exception, or the source of the intercepted exception was transient and/or non-deterministic, thus dropping it is ok-ish. Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0") Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d4963e31 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Make kvm_queued_exception a properly named, visible struct Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c33f6f222 |
|
07-Jun-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2f4073e0 |
|
24-May-2022 |
Tao Xu <tao3.xu@intel.com> |
KVM: VMX: Enable Notify VM exit There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
938c8745 |
|
24-May-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings Add kvm_caps to hold a variety of capabilites and defaults that aren't handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce the amount of boilerplate code required to add a new feature. The vast majority (all?) of the caps interact with vendor code and are written only during initialization, i.e. should be tagged __read_mostly, declared extern in x86.h, and exported. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cb00a70b |
|
19-Jan-2022 |
David Matlack <dmatlack@google.com> |
KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1fb85d06 |
|
31-Jan-2022 |
Adrian Hunter <adrian.hunter@intel.com> |
x86: Share definition of __is_canonical_address() Reduce code duplication by moving canonical address code to a common header file. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220131072453.2839535-3-adrian.hunter@intel.com
|
#
b2d2af7e |
|
01-Feb-2022 |
Mark Rutland <mark.rutland@arm.com> |
kvm/x86: rework guest entry logic For consistency and clarity, migrate x86 over to the generic helpers for guest timing and lockdep/RCU/tracing management, and remove the x86-specific helpers. Prior to this patch, the guest timing was entered in kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and svm_vcpu_enter_exit()), and was exited by the call to vtime_account_guest_exit() within vcpu_enter_guest(). To minimize duplication and to more clearly balance entry and exit, both entry and exit of guest timing are placed in vcpu_enter_guest(), using the new guest_timing_{enter,exit}_irqoff() helpers. When context tracking is used a small amount of additional time will be accounted towards guests; tick-based accounting is unnaffected as IRQs are disabled at this point and not enabled until after the return from the guest. This also corrects (benign) mis-balanced context tracking accounting introduced in commits: ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath") 26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath") Where KVM can enter a guest multiple times, calling vtime_guest_enter() without a corresponding call to vtime_account_guest_exit(), and with vtime_account_system() called when vtime_account_guest() should be used. As account_system_time() checks PF_VCPU and calls account_guest_time(), this doesn't result in any functional problem, but is unnecessarily confusing. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Message-Id: <20220201132926.3301912-4-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4732f244 |
|
11-Jan-2022 |
Like Xu <likexu@tencent.com> |
KVM: x86: Making the module parameter of vPMU more common The new module parameter to control PMU virtualization should apply to Intel as well as AMD, for situations where userspace is not trusted. If the module parameter allows PMU virtualization, there could be a new KVM_CAP or guest CPUID bits whereby userspace can enable/disable PMU virtualization on a per-VM basis. If the module parameter does not allow PMU virtualization, there should be no userspace override, since we have no precedent for authorizing that kind of override. If it's false, other counter-based profiling features (such as LBR including the associated CPUID bits if any) will not be exposed. Change its name from "pmu" to "enable_pmu" as we have temporary variables with the same name in our code like "struct kvm_pmu *pmu". Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization") Suggested-by : Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220111073823.21885-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
55749769 |
|
10-Dec-2021 |
David Woodhouse <dwmw@amazon.co.uk> |
KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirty When dirty ring logging is enabled, any dirty logging without an active vCPU context will cause a kernel oops. But we've already declared that the shared_info page doesn't get dirty tracking anyway, since it would be kind of insane to mark it dirty every time we deliver an event channel interrupt. Userspace is supposed to just assume it's always dirty any time a vCPU can run or event channels are routed. So stop using the generic kvm_write_wall_clock() and just write directly through the gfn_to_pfn_cache that we already have set up. We can make kvm_write_wall_clock() static in x86.c again now, but let's not remove the 'sec_hi_ofs' argument even though it's not used yet. At some point we *will* want to use that for KVM guests too. Fixes: 629b5348841a ("KVM: x86/xen: update wallclock region") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
db215756 |
|
10-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: More precisely identify NMI from guest when handling PMI Differentiate between IRQ and NMI for KVM's PMC overflow callback, which was originally invoked in response to an NMI that arrived while the guest was running, but was inadvertantly changed to fire on IRQs as well when support for perf without PMU/NMI was added to KVM. In practice, this should be a nop as the PMC overflow callback shouldn't be reached, but it's a cheap and easy fix that also better documents the situation. Note, this also doesn't completely prevent false positives if perf somehow ends up calling into KVM, e.g. an NMI can arrive in host after KVM sets its flag. Fixes: dd60d217062f ("KVM: x86: Fix perf timer mode IP reporting") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20211111020738.2512932-12-seanjc@google.com
|
#
73cd107b |
|
10-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable Use the generic kvm_running_vcpu plus a new 'handling_intr_from_guest' variable in kvm_arch_vcpu instead of the semi-redundant current_vcpu. kvm_before/after_interrupt() must be called while the vCPU is loaded, (which protects against preemption), thus kvm_running_vcpu is guaranteed to be non-NULL when handling_intr_from_guest is non-zero. Switching to kvm_get_running_vcpu() will allows moving KVM's perf callbacks to generic code, and the new flag will be used in a future patch to more precisely identify the "NMI from guest" case. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20211111020738.2512932-11-seanjc@google.com
|
#
40e5f908 |
|
24-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Abide to KVM_REQ_TLB_FLUSH_GUEST request on nested vmentry/vmexit Like KVM_REQ_TLB_FLUSH_CURRENT, the GUEST variant needs to be serviced at nested transitions, as KVM doesn't track requests for L1 vs L2. E.g. if there's a pending flush when a nested VM-Exit occurs, then the flush was requested in the context of L2 and needs to be handled before switching to L1, otherwise the flush for L2 would effectiely be lost. Opportunistically add a helper to handle CURRENT and GUEST as a pair, the logic for when they need to be serviced is identical as both requests are tied to L1 vs. L2, the only difference is the scope of the flush. Reported-by: Lai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 07ffaf343e34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b5aead00 |
|
23-May-2021 |
Tom Lendacky <thomas.lendacky@amd.com> |
KVM: x86: Assume a 64-bit hypercall for guests with protected state When processing a hypercall for a guest with protected state, currently SEV-ES guests, the guest CS segment register can't be checked to determine if the guest is in 64-bit mode. For an SEV-ES guest, it is expected that communication between the guest and the hypervisor is performed to shared memory using the GHCB. In order to use the GHCB, the guest must have been in long mode, otherwise writes by the guest to the GHCB would be encrypted and not be able to be comprehended by the hypervisor. Create a new helper function, is_64_bit_hypercall(), that assumes the guest is in 64-bit mode when the guest has protected state, and returns true, otherwise invoking is_64_bit_mode() to determine the mode. Update the hypercall related routines to use is_64_bit_hypercall() instead of is_64_bit_mode(). Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to this helper function for a guest running with protected state. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
dfd3c713 |
|
21-Oct-2021 |
Jim Mattson <jmattson@google.com> |
kvm: x86: Remove stale declaration of kvm_no_apic_vcpu This variable was renamed to kvm_has_noapic_vcpu in commit 6e4e3b4df4e3 ("KVM: Stop using deprecated jump label APIs"). Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20211021185449.3471763-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
65297341 |
|
09-Aug-2021 |
Uros Bizjak <ubizjak@gmail.com> |
KVM: x86: Move declaration of kvm_spurious_fault() to x86.h Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h, it should never be called by anything other than low level KVM code. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> [sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210809173955.1710866-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
87e99d7d |
|
22-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper Get LA57 from the role_regs, which are initialized from the vCPU even though TDP is enabled, instead of pulling the value directly from the vCPU when computing the guest's root_level for TDP MMUs. Note, the check is inside an is_long_mode() statement, so that requirement is not lost. Use role_regs even though the MMU's role is available and arguably "better". A future commit will consolidate the guest root level logic, and it needs access to EFER.LMA, which is not tracked in the role (it can't be toggled on VM-Exit, unlike LA57). Drop is_la57_mode() as there are no remaining users, and to discourage pulling MMU state from the vCPU (in the future). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-41-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bc908e09 |
|
04-May-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Consolidate guest enter/exit logic to common helpers Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common helpers. Opportunistically update the somewhat stale comment about the updates needing to occur immediately after VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
|
#
27b4a9c4 |
|
21-Apr-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults Append raw to the direct variants of kvm_register_read/write(), and drop the "l" from the mode-aware variants. I.e. make the mode-aware variants the default, and make the direct variants scary sounding so as to discourage use. Accessing the full 64-bit values irrespective of mode is rarely the desired behavior. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cb6a32c2 |
|
02-Mar-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Handle triple fault in L2 without killing L1 Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
648fc8ae |
|
03-Feb-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move nVMX's consistency check macro to common code Move KVM's CC() macro to x86.h so that it can be reused by nSVM. Debugging VM-Enter is as painful on SVM as it is on VMX. Rename the more visible macro to KVM_NESTED_VMENTER_CONSISTENCY_CHECK to avoid any collisions with the uber-concise "CC". Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ecaf088f |
|
26-Mar-2021 |
Dongli Zhang <dongli.zhang@oracle.com> |
KVM: x86: remove unused declaration of kvm_write_tsc() kvm_write_tsc() was renamed and made static since commit 0c899c25d754 ("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove its unused declaration. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210326070334.12310-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
629b5348 |
|
28-Jun-2018 |
Joao Martins <joao.m.martins@oracle.com> |
KVM: x86/xen: update wallclock region Wallclock on Xen is written in the shared_info page. To that purpose, export kvm_write_wall_clock() and pass on the GPA of its location to populate the shared_info wall clock data. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
#
d89d04ab |
|
02-Feb-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: move EXIT_FASTPATH_REENTER_GUEST to common code Now that KVM is using static calls, calling vmx_vcpu_run and vmx_sync_pir_to_irr does not incur anymore the cost of a retpoline. Therefore there is no need anymore to handle EXIT_FASTPATH_REENTER_GUEST in vendor code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b3646477 |
|
14-Jan-2021 |
Jason Baron <jbaron@akamai.com> |
KVM: x86: use static calls to reduce kvm_x86_ops overhead Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are covered here except for 'pmu_ops and 'nested ops'. Here are some numbers running cpuid in a loop of 1 million calls averaged over 5 runs, measured in the vm (lower is better). Intel Xeon 3000MHz: |default |mitigations=off ------------------------------------- vanilla |.671s |.486s static call|.573s(-15%)|.458s(-6%) AMD EPYC 2500MHz: |default |mitigations=off ------------------------------------- vanilla |.710s |.609s static call|.664s(-6%) |.609s(0%) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4aa2691d |
|
26-Jan-2021 |
Wei Huang <wei.huang2@amd.com> |
KVM: x86: Factor out x86 instruction emulation with decoding Move the instruction decode part out of x86_emulate_instruction() for it to be used in other places. Also kvm_clear_exception_queue() is moved inside the if-statement as it doesn't apply when KVM are coming back from userspace. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210126081831.570253-2-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d855066f |
|
07-Jan-2021 |
Like Xu <like.xu@linux.intel.com> |
KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTL SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the svm_get/set_msr, so the x86 common part can be safely moved to VMX. This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL. Add vmx_supported_debugctl() to refactor the throwing logic of #GP. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com> [Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception to guest". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4683d758 |
|
01-Feb-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check Commit 7a873e455567 ("KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even when PCID support is missing: ==== Test Assertion Failure ==== x86_64/set_sregs_test.c:41: rc pid=6956 tid=6956 - Invalid argument 1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41 2 0x00000000004014fc: main at set_sregs_test.c:119 3 0x00007f2d9346d041: ?? ??:0 4 0x000000000040164d: _start at ??:? KVM allowed unsupported CR4 bit (0x20000) Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make kvm_is_valid_cr4() fail. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210201142843.108190-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
86137773 |
|
10-Dec-2020 |
Tom Lendacky <thomas.lendacky@amd.com> |
KVM: SVM: Provide support for SEV-ES vCPU loading An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES hardware will restore certain registers on VMEXIT, but not save them on VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the following changes: General vCPU load changes: - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and save the current values of XCR0, XSS and PKRU to the per-CPU SVM save area as these registers will be restored on VMEXIT. General vCPU put changes: - Do not attempt to restore registers that SEV-ES hardware has already restored on VMEXIT. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7ed9abfe |
|
10-Dec-2020 |
Tom Lendacky <thomas.lendacky@amd.com> |
KVM: SVM: Support string IO operations for an SEV-ES guest For an SEV-ES guest, string-based port IO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. For string-based port IO operations, invoke SEV-ES specific routines that can complete the operation using common KVM port IO support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8f423a80 |
|
10-Dec-2020 |
Tom Lendacky <thomas.lendacky@amd.com> |
KVM: SVM: Support MMIO for an SEV-ES guest For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. The GHCB specification provides software-defined VMGEXIT exit codes to indicate a request for an MMIO read or an MMIO write. Add support to recognize the MMIO requests and invoke SEV-ES specific routines that can complete the MMIO operation. These routines use common KVM support to complete the MMIO operation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3f1a18b9 |
|
29-Oct-2020 |
Uros Bizjak <ubizjak@gmail.com> |
KVM/VMX/SVM: Move kvm_machine_check function to x86.h Move kvm_machine_check to x86.h to avoid two exact copies of the same function in kvm.c and svm.c. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029135600.122392-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ee69c92b |
|
06-Oct-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Return bool instead of int for CR4 and SREGS validity checks Rework the common CR4 and SREGS checks to return a bool instead of an int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to clarify the polarity of the return value (which is effectively inverted by this change). No functional changed intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cc4cb017 |
|
01-Nov-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: use positive error values for msr emulation that causes #GP Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
51de8151 |
|
25-Sep-2020 |
Alexander Graf <graf@amazon.com> |
KVM: x86: Add infrastructure for MSR filtering In the following commits we will add pieces of MSR filtering. To ensure that code compiles even with the feature half-merged, let's add a few stubs and struct definitions before the real patches start. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-4-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9715092f |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: X86: Move handling of INVPCID types to x86 INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3f3393b3 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
68ca7663 |
|
10-Sep-2020 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: LAPIC: Narrow down the kick target vCPU The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer fires on a different pCPU which vCPU is running on. This kick costs about 1000 clock cycles and we don't need this when injecting already-expired timer or when using the VMX preemption timer because kvm_lapic_expired_hv_timer() is called from the target vCPU. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
89786147 |
|
10-Jul-2020 |
Mohammed Gamal <mgamal@redhat.com> |
KVM: x86: Add helper functions for illegal GPA checking and page fault injection This patch adds two helper functions that will be used to support virtualizing MAXPHYADDR in both kvm-intel.ko and kvm.ko. kvm_fixup_and_inject_pf_error() injects a page fault for a user-specified GVA, while kvm_mmu_is_illegal_gpa() checks whether a GPA exceeds vCPU address limits. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-2-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
841c2be0 |
|
08-Jul-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host To avoid complex and in some cases incorrect logic in kvm_spec_ctrl_test_value, just try the guest's given value on the host processor instead, and if it doesn't #GP, allow the guest to set it. One such case is when host CPU supports STIBP mitigation but doesn't support IBRS (as is the case with some Zen2 AMD cpus), and in this case we were giving guest #GP when it tried to use STIBP The reason why can can do the host test is that IA32_SPEC_CTRL msr is passed to the guest, after the guest sets it to a non zero value for the first time (due to performance reasons), and as as result of this, it is pointless to emulate #GP condition on this first access, in a different way than what the host CPU does. This is based on a patch from Sean Christopherson, who suggested this idea. Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200708115731.180097-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
761e4169 |
|
07-Jul-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests According to section "Canonicalization and Consistency Checks" in APM vol. 2 the following guest state is illegal: "Any MBZ bit of CR3 is set." "Any MBZ bit of CR4 is set." Suggeted-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <1594168797-29444-3-git-send-email-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
53efe527 |
|
08-Jul-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: Make CR4.VMXE reserved for the guest CR4.VMXE is reserved unless the VMX CPUID bit is set. On Intel, it is also tested by vmx_set_cr4, but AMD relies on kvm_valid_cr4, so fix it. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b899c132 |
|
07-Jul-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: x86: Create mask for guest CR4 reserved bits in kvm_update_cpuid() Instead of creating the mask for guest CR4 reserved bits in kvm_valid_cr4(), do it in kvm_update_cpuid() so that it can be reused instead of creating it each time kvm_valid_cr4() is called. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <1594168797-29444-2-git-send-email-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f5f6145e |
|
22-May-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: x86: Move the check for upper 32 reserved bits of DR6 to separate function Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200522221954.32131-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6abe9c13 |
|
22-Jun-2020 |
Peter Xu <peterx@redhat.com> |
KVM: X86: Move ignore_msrs handling upper the stack MSR accesses can be one of: (1) KVM internal access, (2) userspace access (e.g., via KVM_SET_MSRS ioctl), (3) guest access. The ignore_msrs was previously handled by kvm_get_msr_common() and kvm_set_msr_common(), which is the bottom of the msr access stack. It's working in most cases, however it could dump unwanted warning messages to dmesg even if kvm get/set the msrs internally when calling __kvm_set_msr() or __kvm_get_msr() (e.g. kvm_cpuid()). Ideally we only want to trap cases (2) or (3), but not (1) above. To achieve this, move the ignore_msrs handling upper until the callers of __kvm_get_msr() and __kvm_set_msr(). To identify the "msr missing" event, a new return value (KVM_MSR_RET_INVALID==2) is used for that. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200622220442.21998-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
404d5d7b |
|
28-Apr-2020 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Introduce more exit_fastpath_completion enum values Adds a fastpath_t typedef since enum lines are a bit long, and replace EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values. - EXIT_FASTPATH_EXIT_HANDLED kvm will still go through it's full run loop, but it would skip invoking the exit handler. - EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered without invoking the exit handler or going back to vcpu_run Tested-by: Haiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5a9f5443 |
|
28-Apr-2020 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Introduce kvm_vcpu_exit_request() helper Introduce kvm_vcpu_exit_request() helper, we need to check some conditions before enter guest again immediately, we skip invoking the exit handler and go through full run loop if complete fastpath but there is stuff preventing we enter guest again immediately. Tested-by: Haiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-5-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
eeeb4f67 |
|
20-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g. to flush L2's context during nested VM-Enter. Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where the flush is directly associated with vCPU-scoped instruction emulation, i.e. MOV CR3 and INVPCID. Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to make it clear that it deliberately requests a flush of all contexts. Service any pending flush request on nested VM-Exit as it's possible a nested VM-Exit could occur after requesting a flush for L2. Add the same logic for nested VM-Enter even though it's _extremely_ unlikely for flush to be pending on nested VM-Enter, but theoretically possible (in the future) due to RSM (SMM) emulation. [*] Intel also has an Address Space Identifier (ASID) concept, e.g. EPTP+VPID+PCID == ASID, it's just not documented in the SDM because the rules of invalidation are different based on which piece of the ASID is being changed, i.e. whether the EPTP, VPID, or PCID context must be invalidated. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
afaf0b2f |
|
21-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
408e9a31 |
|
05-Mar-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: CPUID: add support for supervisor states Current CPUID 0xd enumeration code does not support supervisor states, because KVM only supports setting IA32_XSS to zero. Change it instead to use a new variable supported_xss, to be set from the hardware_setup callback which is in charge of CPU capabilities. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
615a4ae1 |
|
02-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Make kvm_mpx_supported() an inline function Expose kvm_mpx_supported() as a static inline so that it can be inlined in kvm_intel.ko. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cfc48181 |
|
02-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Calculate the supported xcr0 mask at load time Add a new global variable, supported_xcr0, to track which xcr0 bits can be exposed to the guest instead of calculating the mask on every call. The supported bits are constant for a given instance of KVM. This paves the way toward eliminating the ->mpx_supported() call in kvm_mpx_supported(), e.g. eliminates multiple retpolines in VMX's nested VM-Enter path, and eventually toward eliminating ->mpx_supported() altogether. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2f728d66 |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move kvm_emulate.h into KVM's private directory Now that the emulation context is dynamically allocated and not embedded in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public asm directory and into KVM's private x86 directory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f0ed4760 |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move emulation-only helpers to emulate.c Move ctxt_virt_addr_bits() and emul_is_noncanonical_address() from x86.h to emulate.c. This eliminates all references to struct x86_emulate_ctxt from x86.h, and sets the stage for a future patch to stop including kvm_emulate.h in asm/kvm_host.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9b5e8532 |
|
24-Jan-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Take a u64 when checking for a valid dr7 value Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build warning on i386 due to right-shifting a 32-bit value by 32 when checking for bits being set in dr7[63:32]. Alternatively, the warning could be resolved by rewriting the check to use an i386-friendly method, but taking a u64 fixes another oddity on 32-bit KVM. Beause KVM implements natural width VMCS fields as u64s to avoid layout issues between 32-bit and 64-bit, a devious guest can stuff vmcs12->guest_dr7 with a 64-bit value even when both the guest and host are 32-bit kernels. KVM eventually drops vmcs12->guest_dr7[63:32] when propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely on that behavior for correctness. Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b91991bf |
|
15-Jan-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests According to section "Checks on Guest Control Registers, Debug Registers, and and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry of nested guests: If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7 field must be 0. In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the latter synthesizes a #GP if any bit in the high dword in the former is set. Hence this field needs to be checked in software. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
de761ea7 |
|
15-Jan-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Perform non-canonical checks in 32-bit KVM Remove the CONFIG_X86_64 condition from the low level non-canonical helpers to effectively enable non-canonical checks on 32-bit KVM. Non-canonical checks are performed by hardware if the CPU *supports* 64-bit mode, whether or not the CPU is actually in 64-bit mode is irrelevant. For the most part, skipping non-canonical checks on 32-bit KVM is ok-ish because 32-bit KVM always (hopefully) drops bits 63:32 of whatever value it's checking before propagating it to hardware, and architecturally, the expected behavior for the guest is a bit of a grey area since the vCPU itself doesn't support 64-bit mode. I.e. a 32-bit KVM guest can observe the missed checks in several paths, e.g. INVVPID and VM-Enter, but it's debatable whether or not the missed checks constitute a bug because technically the vCPU doesn't support 64-bit mode. The primary motivation for enabling the non-canonical checks is defense in depth. As mentioned above, a guest can trigger a missed check via INVVPID or VM-Enter. INVVPID is straightforward as it takes a 64-bit virtual address as part of its 128-bit INVVPID descriptor and fails if the address is non-canonical, even if INVVPID is executed in 32-bit PM. Nested VM-Enter is a bit more convoluted as it requires the guest to write natural width VMCS fields via memory accesses and then VMPTRLD the VMCS, but it's still possible. In both cases, KVM is saved from a true bug only because its flows that propagate values to hardware (correctly) take "unsigned long" parameters and so drop bits 63:32 of the bad value. Explicitly performing the non-canonical checks makes it less likely that a bad value will be propagated to hardware, e.g. in the INVVPID case, if __invvpid() didn't implicitly drop bits 63:32 then KVM would BUG() on the resulting unexpected INVVPID failure due to hardware rejecting the non-canonical address. The only downside to enabling the non-canonical checks is that it adds a relatively small amount of overhead, but the affected flows are not hot paths, i.e. the overhead is negligible. Note, KVM technically could gate the non-canonical checks on 32-bit KVM with static_cpu_has(X86_FEATURE_LM), but on bare metal that's an even bigger waste of code for everyone except the 0.00000000000001% of the population running on Yonah, and nested 32-bit on 64-bit already fudges things with respect to 64-bit CPU behavior. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Also do so in nested_vmx_check_host_state as reported by Krish. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6441fa61 |
|
20-Jan-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL If the guest is configured to have SPEC_CTRL but the host does not (which is a nonsensical configuration but these are not explicitly forbidden) then a host-initiated MSR write can write vmx->spec_ctrl (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to restore the host value of the MSR. Add a more comprehensive check for valid bits of SPEC_CTRL, covering host CPUID flags and, since we are at it and it is more correct that way, guest CPUID flags too. For AMD, remove the unnecessary is_guest_mode check around setting the MSR interception bitmap, so that the code looks the same as for Intel. Cc: Jim Mattson <jmattson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a0a2260c |
|
17-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move bit() helper to cpuid.h Move bit() to cpuid.h in preparation for incorporating the reverse_cpuid array in bit() build-time assertions. Opportunistically use the BIT() macro instead of open-coding the shift. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1e9e2622 |
|
20-Nov-2019 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our product observation, multicast IPIs are not as common as unicast IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. This patch introduce a mechanism to handle certain performance-critical WRMSRs in a very early stage of KVM VMExit handler. This mechanism is specifically used for accelerating writes to x2APIC ICR that attempt to send a virtual IPI with physical destination-mode, fixed delivery-mode and single target. Which was found as one of the main causes of VMExits for Linux workloads. The reason this mechanism significantly reduce the latency of such virtual IPIs is by sending the physical IPI to the target vCPU in a very early stage of KVM VMExit handler, before host interrupts are enabled and before expensive operations such as reacquiring KVM’s SRCU lock. Latency is reduced even more when KVM is able to use APICv posted-interrupt mechanism (which allows to deliver the virtual IPI directly to target vCPU without the need to kick it to host). Testing on Xeon Skylake server: The virtual IPI latency from sender send to receiver receive reduces more than 200+ cpu cycles. Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
736c291c |
|
06-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
27cbe7d6 |
|
11-Nov-2019 |
Liran Alon <liran.alon@oracle.com> |
KVM: x86: Prevent set vCPU into INIT/SIPI_RECEIVED state when INIT are latched Commit 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") fixed KVM to also latch pending LAPIC INIT event when vCPU is in VMX operation. However, current API of KVM_SET_MP_STATE allows userspace to put vCPU into KVM_MP_STATE_SIPI_RECEIVED or KVM_MP_STATE_INIT_RECEIVED even when vCPU is in VMX operation. Fix this by introducing a util method to check if vCPU state latch INIT signals and use it in KVM_SET_MP_STATE handler. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
139a12cf |
|
21-Oct-2019 |
Aaron Lewis <aaronlewis@google.com> |
KVM: x86: Move IA32_XSS-swapping on VM-entry/VM-exit to common x86 code Hoist the vendor-specific code related to loading the hardware IA32_XSS MSR with guest/host values on VM-entry/VM-exit to common x86 code. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ic6e3430833955b98eb9b79ae6715cf2a3fdd6d82 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
489cbcf0 |
|
27-Sep-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add WARNs to detect out-of-bounds register indices Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg values that would cause KVM to overflow vcpu->arch.regs. Change the reg param to an 'int' to make it clear that the reg index is unverified. Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead is limited to flows where the reg index is generated at runtime. And there is at least one historical bug where KVM has generated an out-of- bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels"). Adding the WARN_ON_ONCE() protection paves the way for additional cleanup related to kvm_reg and kvm_reg_ex. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9497e1f2 |
|
27-Aug-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move triple fault request into RM int injection Request triple fault in kvm_inject_realmode_interrupt() instead of returning EMULATE_FAIL and deferring to the caller. All existing callers request triple fault and it's highly unlikely Real Mode is going to acquire new features. While this consolidates a small amount of code, the real goal is to remove the last reference to EMULATE_FAIL. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
871bd034 |
|
01-Aug-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Rename access permissions cache member in struct kvm_vcpu_arch Rename "access" to "mmio_access" to match the other MMIO cache members and to make it more obvious that it's tracking the access permissions for the MMIO cache. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0c5f81da |
|
05-Jul-2019 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: LAPIC: Inject timer interrupt via posted interrupt Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bf03d4f9 |
|
06-Jun-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce is_pae_paging Checking for 32-bit PAE is quite common around code that fiddles with the PDPTRs. Add a function to compress all checks into a single invocation. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b5170063 |
|
21-May-2019 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Provide a capability to disable cstate msr read intercepts Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
39497d76 |
|
17-Apr-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: lapic: Track lapic timer advance per vCPU Automatically adjusting the globally-shared timer advancement could corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting the advancement value. That could be partially fixed by using a local variable for the arithmetic, but it would still be susceptible to a race when setting timer_advance_adjust_done. And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the correct calibration for a given vCPU may not apply to all vCPUs. Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is effectively violated when finding a stable advancement takes an extended amount of timer. Opportunistically change the definition of lapic_timer_advance_ns to a u32 so that it matches the style of struct kvm_timer. Explicitly pass the param to kvm_create_lapic() so that it doesn't have to be exposed to lapic.c, thus reducing the probability of unintentionally using the global value instead of the per-vCPU value. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
674ea351 |
|
10-Apr-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: optimize check for valid PAT value This check will soon be done on every nested vmentry and vmexit, "parallelize" it using bitwise operations. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1811d979 |
|
12-Apr-2019 |
WANG Chao <chao.wang@ucloud.cn> |
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context guest xcr0 could leak into host when MCE happens in guest mode. Because do_machine_check() could schedule out at a few places. For example: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) { vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule out, host cpu has guest xcr0 loaded (0xff). In __switch_to { switch_fpu_finish copy_kernel_to_fpregs XRSTORS If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in and tries to reinitialize fpu by restoring init fpu state. Same story as last #GP, except we get DOUBLE FAULT this time. Cc: stable@vger.kernel.org Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
361209e0 |
|
05-Feb-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Explicitly define the "memslot update in-progress" bit KVM uses bit 0 of the memslots generation as an "update in-progress" flag, which is used by x86 to prevent caching MMIO access while the memslots are changing. Although the intended behavior is flag-like, e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid caching data from in-flux memslots, the implementation oftentimes treats the bit as part of the generation number itself, e.g. incrementing the generation increments twice, once to set the flag and once to clear it. Prior to commit 4bd518f1598d ("KVM: use separate generations for each address space"), incorporating the "update in-progress" bit into the generation number largely made sense, e.g. "real" generations are even, "bogus" generations are odd, most code doesn't need to be aware of the bit, etc... Now that unique memslots generation numbers are assigned to each address space, stealthing the in-progress status into the generation number results in a wide variety of subtle code, e.g. kvm_create_vm() jumps over bit 0 when initializing the memslots generation without any hint as to why. Explicitly define the flag and convert as much code as possible (which isn't much) to actually treat it like a flag. This paves the way for eventually using a different bit for "update in-progress" so that it can be a flag in truth instead of a awkward extension to the generation number. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ddfd1730 |
|
05-Feb-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux When installing new memslots, KVM sets bit 0 of the generation number to indicate that an update is in-progress. Until the update is complete, there are no guarantees as to whether a vCPU will see the old or the new memslots. Explicity prevent caching MMIO accesses so as to avoid using an access cached from the old memslots after the new memslots have been installed. Note that it is unclear whether or not disabling caching during the update window is strictly necessary as there is no definitive documentation as to what ordering guarantees KVM provides with respect to updating memslots. That being said, the MMIO spte code does not allow reusing sptes created while an update is in-progress, and the associated documentation explicitly states: We do not want to use an MMIO sptes created with an odd generation number, ... If KVM is unlucky and creates an MMIO spte while the low bit is 1, the next access to the spte will always be a cache miss. At the very least, disabling the per-vCPU MMIO cache during updates will make its behavior consistent with the MMIO spte behavior and documentation. Fixes: 56f17dd3fbc4 ("kvm: x86: fix stale mmio cache bug") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
da998b46 |
|
16-Oct-2018 |
Jim Mattson <jmattson@google.com> |
kvm: x86: Defer setting of CR2 until #PF delivery When exception payloads are enabled by userspace (which is not yet possible) and a #PF is raised in L2, defer the setting of CR2 until the #PF is delivered. This allows the L1 hypervisor to intercept the fault before CR2 is modified. For backwards compatibility, when exception payloads are not enabled by userspace, kvm_multiple_exception modifies CR2 when the #PF exception is raised. Reported-by: Jim Mattson <jmattson@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c60658d1 |
|
23-Aug-2018 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Unexport x86_emulate_instruction() Allowing x86_emulate_instruction() to be called directly has led to subtle bugs being introduced, e.g. not setting EMULTYPE_NO_REEXECUTE in the emulation type. While most of the blame lies on re-execute being opt-out, exporting x86_emulate_instruction() also exposes its cr2 parameter, which may have contributed to commit d391f1207067 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") using x86_emulate_instruction() instead of emulate_instruction() because "hey, I have a cr2!", which in turn introduced its EMULTYPE_NO_REEXECUTE bug. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
0447378a |
|
20-Jun-2018 |
Marc Orr <marcorr@google.com> |
kvm: vmx: Nested VM-entry prereqs for event inj. This patch extends the checks done prior to a nested VM entry. Specifically, it extends the check_vmentry_prereqs function with checks for fields relevant to the VM-entry event injection information, as described in the Intel SDM, volume 3. This patch is motivated by a syzkaller bug, where a bad VM-entry interruption information field is generated in the VMCS02, which causes the nested VM launch to fail. Then, KVM fails to resume L1. While KVM should be improved to correctly resume L1 execution after a failed nested launch, this change is justified because the existing code to resume L1 is flaky/ad-hoc and the test coverage for resuming L1 is sparse. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marc Orr <marcorr@google.com> [Removed comment whose parts were describing previous revisions and the rest was obvious from function/variable naming. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
ce14e868a |
|
06-Jun-2018 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system Int the next patch the emulator's .read_std and .write_std callbacks will grow another argument, which is not needed in kvm_read_guest_virt and kvm_write_guest_virt_system's callers. Since we have to make separate functions, let's give the currently existing names a nicer interface, too. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5e62493f |
|
16-Apr-2018 |
KarimAllah Ahmed <karahmed@amazon.de> |
x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
082d06ed |
|
03-Apr-2018 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Introduce handle_ud() Introduce handle_ud() to handle invalid opcode, this function will be used by later patches. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
04140b41 |
|
22-Mar-2018 |
Liran Alon <liran.alon@oracle.com> |
KVM: x86: Rename interrupt.pending to interrupt.injected For exceptions & NMIs events, KVM code use the following coding convention: *) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. *) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
8566ac8b |
|
16-Mar-2018 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Implement pause loop exit logic in SVM Bring the PLE(pause loop exit) logic to AMD svm driver. While testing, we found this helping in situations where numerous pauses are generated. Without these patches we could see continuos VMEXITS due to pause interceptions. Tested it on AMD EPYC server with boot parameter idle=poll on a VM with 32 vcpus to simulate extensive pause behaviour. Here are VMEXITS in 10 seconds interval. Pauses 810199 504 Total 882184 325415 Signed-off-by: Babu Moger <babu.moger@amd.com> [Prevented the window from dropping below the initial value. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
c8e88717 |
|
16-Mar-2018 |
Babu Moger <babu.moger@amd.com> |
KVM: VMX: Bring the common code to header file This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
dd60d217 |
|
25-Jul-2017 |
Andi Kleen <ak@linux.intel.com> |
KVM: x86: Fix perf timer mode IP reporting KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: Andi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
b31c114b |
|
12-Mar-2018 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Provide a capability to disable PAUSE intercepts Allow to disable pause loop exit/pause filtering on a per VM basis. If some VMs have dedicated host CPUs, they won't be negatively affected due to needlessly intercepted PAUSE instructions. Thanks to Jan H. Schönherr's initial patch. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
caa057a2 |
|
12-Mar-2018 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Provide a capability to disable HLT intercepts If host CPUs are dedicated to a VM, we can avoid VM exits on HLT. This patch adds the per-VM capability to disable them. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4d5422ce |
|
12-Mar-2018 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: X86: Provide a capability to disable MWAIT intercepts Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c4ae60e4 |
|
12-Mar-2018 |
Liran Alon <liran.alon@oracle.com> |
KVM: x86: Add module parameter for supporting VMware backdoor Support access to VMware backdoor requires KVM to intercept #GP exceptions from guest which introduce slight performance hit. Therefore, control this support by module parameter. Note that module parameter is exported as it should be consumed by kvm_intel & kvm_amd to determine if they should intercept #GP or not. This commit doesn't change semantics. It is done as a preparation for future commits. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5c7d4f9a |
|
19-Nov-2017 |
Liran Alon <liran.alon@oracle.com> |
KVM: nVMX: Fix bug of injecting L2 exception into L1 kvm_clear_exception_queue() should clear pending exception. This also includes exceptions which were only marked pending but not yet injected. This is because exception.pending is used for both L1 and L2 to determine if an exception should be raised to guest. Note that an exception which is pending but not yet injected will be raised again once the guest will be resumed. Consider the following scenario: 1) L0 KVM with ignore_msrs=false. 2) L1 prepare vmcs12 with the following: a) No intercepts on MSR (MSR_BITMAP exist and is filled with 0). b) No intercept for #GP. c) vmx-preemption-timer is configured. 3) L1 enters into L2. 4) L2 reads an unhandled MSR that exists in MSR_BITMAP (such as 0x1fff). L2 RDMSR could be handled as described below: 1) L2 exits to L0 on RDMSR and calls handle_rdmsr(). 2) handle_rdmsr() calls kvm_inject_gp() which sets KVM_REQ_EVENT, exception.pending=true and exception.injected=false. 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls inject_pending_event() which calls vmx_check_nested_events() which sees that exception.pending=true but nested_vmx_check_exception() returns 0 and therefore does nothing at this point. However let's assume it later sees vmx-preemption-timer expired and therefore exits from L2 to L1 by calling nested_vmx_vmexit(). 4) nested_vmx_vmexit() calls prepare_vmcs12() which calls vmcs12_save_pending_event() but it does nothing as exception.injected is false. Also prepare_vmcs12() calls kvm_clear_exception_queue() which does nothing as exception.injected is already false. 5) We now return from vmx_check_nested_events() with 0 while still having exception.pending=true! 6) Therefore inject_pending_event() continues and we inject L2 exception to L1!... This commit will fix above issue by changing step (4) to clear exception.pending in kvm_clear_exception_queue(). Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
431f5d44 |
|
29-Nov-2017 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: simplify kvm_mwait_in_guest() If Intel/AMD implements MWAIT, we expect that it works well and only reject known bugs; no reason to do it the other way around for minor vendors. (Not that they are relevant ATM.) This allows further simplification of kvm_mwait_in_guest(). And use boot_cpu_has() instead of "cpu_has(&boot_cpu_data," while at it. Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
346f48fa |
|
29-Nov-2017 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: drop bogus MWAIT check The check was added in some iteration while trying to fix a reported OS X on Core 2 bug, but that bug is elsewhere. The comment is misleading because the guest can call MWAIT with ECX = 0 even if we enforce CPUID5_ECX_INTERRUPT_BREAK; the call would have the exactly the same effect as if the host didn't have the feature. A problem is that a QEMU feature exposes CPUID5_ECX_INTERRUPT_BREAK on CPUs that do not support it. Removing the check changes behavior on last Pentium 4 lines (Presler, Dempsey, and Tulsa, which had VMX and MONITOR while missing INTERRUPT_BREAK) when running a guest OS that uses MWAIT without checking for its presence (QEMU doesn't expose MONITOR). The only known OS that ignores the MONITOR flag is old Mac OS X and we allowed it to bug on Core 2 (MWAIT used to throw #UD and only that OS noticed), so we can save another 20 lines letting it bug on even older CPUs. Alternatively, we can return MWAIT exiting by default and let userspace toggle it. Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2a140f3b |
|
29-Nov-2017 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: prevent MWAIT in guest with buggy MONITOR The bug prevents MWAIT from waking up after a write to the monitored cache line. KVM might emulate a CPU model that shouldn't have the bug, so the guest would not employ a workaround and possibly miss wakeups. Better to avoid the situation. Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
664f8e26 |
|
24-Aug-2017 |
Wanpeng Li <wanpeng.li@hotmail.com> |
KVM: X86: Fix loss of exception which has not yet been injected vmx_complete_interrupts() assumes that the exception is always injected, so it can be dropped by kvm_clear_exception_queue(). However, an exception cannot be injected immediately if it is: 1) originally destined to a nested guest; 2) trapped to cause a vmexit; 3) happening right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true. This patch applies to exceptions the same algorithm that is used for NMIs, replacing exception.reinject with "exception.injected" (equivalent to nmi_injected). exception.pending now represents an exception that is queued and whose side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet. If exception.pending is true, the exception might result in a nested vmexit instead, too (in which case the side effects must not be applied). exception.injected instead represents an exception that is going to be injected into the guest at the next vmentry. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fd8cb433 |
|
24-Aug-2017 |
Yu Zhang <yu.c.zhang@linux.intel.com> |
KVM: MMU: Expose the LA57 feature to VM. This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
855feb67 |
|
24-Aug-2017 |
Yu Zhang <yu.c.zhang@linux.intel.com> |
KVM: MMU: Add 5 level EPT & Shadow page table support. Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> [Unconditionally reset the MMU context in kvm_cpuid_update. Changing MAXPHYADDR invalidates the reserved bit bitmasks. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9034e6e8 |
|
17-Aug-2017 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: fix use of L1 MMIO areas in nested guests There is currently some confusion between nested and L1 GPAs. The assignment to "direct" in kvm_mmu_page_fault tries to fix that, but it is not enough. What this patch does is fence off the MMIO cache completely when using shadow nested page tables, since we have neither a GVA nor an L1 GPA to put in the cache. This also allows some simplifications in kvm_mmu_page_fault and FNAME(page_fault). The EPT misconfig likewise does not have an L1 GPA to pass to kvm_io_bus_write, so that must be skipped for guest mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Changed comment to say "GPAs" instead of "L1's physical addresses", as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
668fffa3 |
|
20-Apr-2017 |
Michael S. Tsirkin <mst@redhat.com> |
kvm: better MWAIT emulation for guests Guests that are heavy on futexes end up IPI'ing each other a lot. That can lead to significant slowdowns and latency increase for those guests when running within KVM. If only a single guest is needed on a host, we have a lot of spare host CPU time we can throw at the problem. Modern CPUs implement a feature called "MWAIT" which allows guests to wake up sleeping remote CPUs without an IPI - thus without an exit - at the expense of never going out of guest context. The decision whether this is something sensible to use should be up to the VM admin, so to user space. We can however allow MWAIT execution on systems that support it properly hardware wise. This patch adds a CAP to user space and a KVM cpuid leaf to indicate availability of native MWAIT execution. With that enabled, the worst a guest can do is waste as many cycles as a "jmp ." would do, so it's not a privilege problem. We consciously do *not* expose the feature in our CPUID bitmap, as most people will want to benefit from sleeping vCPUs to allow for over commit. Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [agraf: fix amd, change commit message] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
108b249c |
|
01-Sep-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce get_kvmclock_ns Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is *not* a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8d93c874 |
|
20-Jun-2016 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: x86: move nsec_to_cycles from x86.c to x86.h Move the inline function nsec_to_cycles from x86.c to x86.h, as the next patch uses it from lapic.c. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
17a511f8 |
|
22-Mar-2016 |
Huaitong Han <huaitong.han@intel.com> |
KVM, pkeys: add pkeys support for xsave state This patch adds pkeys support for xsave state. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
52004014 |
|
25-Jan-2016 |
Feng Wu <feng.wu@intel.com> |
KVM: x86: Use vector-hashing to deliver lowest-priority interrupts Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b51012de |
|
22-Jan-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce do_shl32_div32 This is similar to the existing div_frac function, but it returns the remainder too. Unlike div_frac, it can be used to implement long division, e.g. (a << 64) / b for 32-bit a and b. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d91cab78 |
|
02-Sep-2015 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/fpu: Rename XSAVE macros There are two concepts that have some confusing naming: 1. Extended State Component numbers (currently called XFEATURE_BIT_*) 2. Extended State Component masks (currently called XSTATE_*) The numbers are (currently) from 0-9. State component 3 is the bounds registers for MPX, for instance. But when we want to enable "state component 3", we go set a bit in XCR0. The bit we set is 1<<3. We can check to see if a state component feature is enabled by looking at its bit. The current 'xfeature_bit's are at best xfeature bit _numbers_. Calling them bits is at best inconsistent with ending the enum list with 'XFEATURES_NR_MAX'. This patch renames the enum to be 'xfeature'. These also happen to be what the Intel documentation calls a "state component". We also want to differentiate these from the "XSTATE_*" macros. The "XSTATE_*" macros are a mask, and we rename them to match. These macros are reasonably widely used so this patch is a wee bit big, but this really is just a rename. The only non-mechanical part of this is the s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/ We need a better name for it, but that's another patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com [ Ported to v4.3-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e83d5887 |
|
03-Jul-2015 |
Andrey Smetanin <asmetanin@virtuozzo.com> |
kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
41dbc6bc |
|
23-Jul-2015 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce kvm_check_has_quirk The logic of the disabled_quirks field usually results in a double negation. Wrap it in a simple function that checks the bit and negates it. Based on a patch from Xiao Guangrong. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6a39bbc5 |
|
15-Jun-2015 |
Xiao Guangrong <guangrong.xiao@linux.intel.com> |
KVM: MTRR: do not map huge page for non-consistent range Based on Intel's SDM, mapping huge page which do not have consistent memory cache for each 4k page will cause undefined behavior In order to avoiding this kind of undefined behavior, we force to use 4k pages under this case Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
19efffa2 |
|
15-Jun-2015 |
Xiao Guangrong <guangrong.xiao@linux.intel.com> |
KVM: MTRR: sort variable MTRRs Sort all valid variable MTRRs based on its base address, it will help us to check a range to see if it's fully contained in variable MTRRs Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Fix list insertion sort, simplify var_mtrr_range_is_valid to just test the V bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ff53604b |
|
15-Jun-2015 |
Xiao Guangrong <guangrong.xiao@linux.intel.com> |
KVM: x86: move MTRR related code to a separate file MTRR code locates in x86.c and mmu.c so that move them to a separate file to make the organization more clearer and it will be the place where we fully implement vMTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
74545705 |
|
27-Apr-2015 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: x86: fix initial PAT value PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bab5bb39 |
|
01-Jan-2015 |
Nicholas Krause <xerofoify@gmail.com> |
kvm: x86: Remove kvm_make_request from lapic.c Adds a function kvm_vcpu_set_pending_timer instead of calling kvm_make_request in lapic.c. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d0659d94 |
|
16-Dec-2014 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: x86: add option to advance tscdeadline hrtimer expiration For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
612263b3 |
|
22-Oct-2014 |
Chao Peng <chao.p.peng@linux.intel.com> |
KVM: x86: Enable Intel AVX-512 for guest Expose Intel AVX-512 feature bits to guest. Also add checks for xcr0 AVX512 related bits according to spec: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4566654b |
|
18-Sep-2014 |
Nadav Amit <namit@cs.technion.ac.il> |
KVM: vmx: Inject #GP on invalid PAT CR Guest which sets the PAT CR to invalid value should get a #GP. Currently, if vmx supports loading PAT CR during entry, then the value is not checked. This patch makes the required check in that case. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
56f17dd3 |
|
18-Aug-2014 |
David Matlack <dmatlack@google.com> |
kvm: x86: fix stale mmio cache bug The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
27e6fb5d |
|
18-Jun-2014 |
Nadav Amit <namit@cs.technion.ac.il> |
KVM: vmx: vmx instructions handling does not consider cs.l VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in 64-bit mode. The current implementation is broken since it does not use the register operands correctly, and always uses 64-bit for reads and writes. Moreover, write to memory in vmwrite only considers long-mode, so it ignores cs.l. This patch fixes this behavior. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5777392e |
|
18-Jun-2014 |
Nadav Amit <nadav.amit@gmail.com> |
KVM: x86: check DR6/7 high-bits are clear only on long-mode When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and otherwise injects a #GP exception. This exception should only be injected only if running in long-mode. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4ff41732 |
|
23-Feb-2014 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: introduce kvm_supported_xcr0() XSAVE support for KVM is already using host_xcr0 & KVM_SUPPORTED_XCR0 as a "dynamic" version of KVM_SUPPORTED_XCR0. However, this is not enough because the MPX bits should not be presented to the guest unless kvm_x86_ops confirms the support. So, replace all instances of host_xcr0 & KVM_SUPPORTED_XCR0 with a new function kvm_supported_xcr0() that also has this check. Note that here: if (xstate_bv & ~KVM_SUPPORTED_XCR0) return -EINVAL; if (xstate_bv & ~host_cr0) return -EINVAL; the code is equivalent to if ((xstate_bv & ~KVM_SUPPORTED_XCR0) || (xstate_bv & ~host_cr0) return -EINVAL; i.e. "xstate_bv & (~KVM_SUPPORTED_XCR0 | ~host_cr0)" which is in turn equal to "xstate_bv & ~(KVM_SUPPORTED_XCR0 & host_cr0)". So we should also use the new function there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
390bd528 |
|
24-Feb-2014 |
Liu, Jinsong <jinsong.liu@intel.com> |
KVM: x86: Enable Intel MPX for guest From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:12:31 +0800 Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest This patch enable Intel MPX feature to guest. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9ed96e87 |
|
05-Jan-2014 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: x86: limit PIT timer frequency Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
647e23bb |
|
02-Oct-2013 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: mask unsupported XSAVE entries from leaf 0Dh index 0 XSAVE entries that KVM does not support are reported by KVM_GET_SUPPORTED_CPUID for leaf 0Dh index 0 if the host supports them; they should be left out unless there is also hypervisor support for them. Sub-leafs are correctly handled in supported_xcr0_bit, fix index 0 to match. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
#
8fe8ab46 |
|
29-Nov-2012 |
Will Auld <will.auld.intel@gmail.com> |
KVM: x86: Add code to track call origin for msr assignment In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
54e9818f |
|
05-Aug-2012 |
Gleb Natapov <gleb@redhat.com> |
KVM: use jump label to optimize checking for in kernel local apic presence Usually all vcpus have local apic pointer initialized, so the check may be completely skipped. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
c36fc04e |
|
07-Mar-2012 |
Davidlohr Bueso <dave@gnu.org> |
KVM: x86: add paging gcc optimization Since most guests will have paging enabled for memory management, add likely() optimization around CR0.PG checks. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
00b27a3e |
|
23-Nov-2011 |
Avi Kivity <avi@redhat.com> |
KVM: Move cpuid code to new file The cpuid code has grown; put it into a separate file. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
bebb106a |
|
11-Jul-2011 |
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> |
KVM: MMU: cache mmio info on page fault path If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
6a4d7550 |
|
25-May-2011 |
Nadav Har'El <nyh@il.ibm.com> |
KVM: nVMX: Implement VMPTRST This patch implements the VMPTRST instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
064aea77 |
|
25-May-2011 |
Nadav Har'El <nyh@il.ibm.com> |
KVM: nVMX: Decoding memory operands of VMX instructions This patch includes a utility function for decoding pointer operands of VMX instructions issued by L1 (a guest hypervisor) Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
71f9833b |
|
13-Apr-2011 |
Serge E. Hallyn <serge@hallyn.com> |
KVM: fix push of wrong eip when doing softint When doing a soft int, we need to bump eip before pushing it to the stack. Otherwise we'll do the int a second time. [apw@canonical.com: merged eip update as per Jan's recommendation.] Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
24d1b15f |
|
07-Dec-2010 |
Joerg Roedel <joerg.roedel@amd.com> |
KVM: SVM: Do not report xsave in supported cpuid To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
63995653 |
|
19-Sep-2010 |
Mohammed Gamal <m.gamal005@gmail.com> |
KVM: Add kvm_inject_realmode_interrupt() wrapper This adds a wrapper function kvm_inject_realmode_interrupt() around the emulator function emulate_int_real() to allow real mode interrupt injection. [avi: initialize operand and address sizes before emulating interrupts] [avi: initialize rip for real mode interrupt injection] [avi: clear interrupt pending flag after emulating interrupt injection] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
6539e738 |
|
10-Sep-2010 |
Joerg Roedel <joerg.roedel@amd.com> |
KVM: MMU: Implement nested gva_to_gpa functions This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
99e3e30a |
|
20-Aug-2010 |
Zachary Amsden <zamsden@redhat.com> |
KVM: x86: Move TSC offset writes to common code Also, ensure that the storing of the offset and the reading of the TSC are never preempted by taking a spinlock. While the lock is overkill now, it is useful later in this patch series. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
a1f4d395 |
|
21-Jun-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
90d83dc3 |
|
19-Apr-2010 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
KVM: use the correct RCU API for PROVE_RCU=y The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm] [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm] [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [<ffffffff810a8692>] ? unlock_page+0x27/0x2c [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm] [<ffffffff81060cfa>] ? up_read+0x23/0x3d [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a [<ffffffff810021db>] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ff9d07a0 |
|
18-Apr-2010 |
Zhang, Yanmin <yanmin_zhang@linux.intel.com> |
KVM: Implement perf callbacks for guest sampling Below patch implements the perf_guest_info_callbacks on kvm. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
f6801dff |
|
21-Jan-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Rename vcpu->shadow_efer to efer None of the other registers have the shadow_ prefix. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
836a1b3c |
|
21-Jan-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Move cr0/cr4/efer related helpers to x86.h They have more general scope than the mmu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
3eeb3288 |
|
21-Jan-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Add a helper for checking if the guest is in protected mode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
fc61b800 |
|
05-Jul-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Add Directed EOI support to APIC emulation Directed EOI is specified by x2APIC, but is available even when lapic is in xAPIC mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
66fd3f7f |
|
11-May-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Do not re-execute INTn instruction. Re-inject event instead. This is what Intel suggest. Also use correct instruction length when re-injecting soft fault/interrupt. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
923c61bb |
|
11-May-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Remove irq_pending bitmap Only one interrupt vector can be injected from userspace irqchip at any given time so no need to store it in a bitmap. Put it into interrupt queue directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
3298b75c |
|
11-May-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Unprotect a page if #PF happens during NMI injection. It is done for exception and interrupt already. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
115666df |
|
21-Apr-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Remove kvm_push_irq() No longer used. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
fe4c7b19 |
|
23-Mar-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: reuse (pop|push)_irq from svm.c in vmx.c The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
937a7eae |
|
03-Jul-2008 |
Avi Kivity <avi@qumranet.com> |
KVM: Add a pending interrupt queue Similar to the exception queue, this hold interrupts that have been accepted by the virtual processor core but not yet injected. Not yet used. Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
26eef70c |
|
03-Jul-2008 |
Avi Kivity <avi@qumranet.com> |
KVM: Clear exception queue before emulating an instruction If we're emulating an instruction, either it will succeed, in which case any previously queued exception will be spurious, or we will requeue the same exception. Signed-off-by: Avi Kivity <avi@qumranet.com>
|