Cross Reference: /freebsd-current/sys/amd64/vmm/amd/svm.c

History log of /freebsd-current/sys/amd64/vmm/amd/svm.c
Revision	Date	Author	Comments
# 683ea4d2	29-Dec-2023	Vitaliy Gusev <gusev.vitaliy@gmail.com>	vmm: MTRR should be saved/restored This fixes restoring a Linux VM if it was suspended while in the GRUB menu. Adding MTTR increases the kernel dump size by 256 bytes per vCPU. Sponsored by: vStack Reviewed by: markj, rew Differential Revision: https://reviews.freebsd.org/D43226
# 181afaaa	07-Dec-2023	Bojan Novković <bojan.novkovic@fer.hr>	vmm: implement VM_CAP_MASK_HWINTR on AMD CPUs This patch implements the interrupt blocking VM capability on AMD CPUs. Implementing this capability allows the GDB stub to single-step a virtual machine without landing inside interrupt handlers. Reviewed by: jhb, corvink Sponsored by: Google, Inc. (GSoC 2022) Differential Revision: https://reviews.freebsd.org/D42299
# e3b4fe64	07-Dec-2023	Bojan Novković <bojan.novkovic@fer.hr>	vmm: implement single-stepping for AMD CPUs This patch implements single-stepping for AMD CPUs using the RFLAGS.TF single-stepping mechanism. The GDB stub requests single-stepping using the VM_CAP_RFLAGS_TF capability. Setting this capability will set the RFLAGS.TF bit on the selected vCPU, activate DB exception intercepts, and activate POPF/PUSH instruction intercepts. The resulting DB exception is then caught by the IDT_DB vmexit handler and bounced to userland where it is processed by the GDB stub. This patch also makes sure that the value of the TF bit is correctly updated and that it is not erroneously propagated into memory. Stepping over PUSHF will cause the vm_handle_db function to correct the pushed RFLAGS value and stepping over POPF will update the shadowed TF bit copy. Reviewed by: jhb Sponsored by: Google, Inc. (GSoC 2022) Differential Revision: https://reviews.freebsd.org/D42296
# 231eee17	07-Dec-2023	Bojan Novković <bojan.novkovic@fer.hr>	vmm: enable software breakpoints for AMD CPUs This patch adds support for software breakpoint vmexits on AMD SVM. It implements the VM_CAP_BPT_EXIT used to enable software breakpoints. When enabled, breakpoint vmexits are passed to userspace where they are handled by the GDB stub. Reviewed by: jhb Sponsored by: Google, Inc. (GSoC 2022) Differential Revision: https://reviews.freebsd.org/D42295
# 78c1d174	07-Dec-2023	Bojan Novković <bojan.novkovic@fer.hr>	vmm: refactor event reflection in AMD SVM This patch refactors AMD SVM event reflection to allow events to be propagated to userland, rather than always reflected into the guest. This is necessary to implement some capabilities that request VMEXITs when a specific exception occurs (e.g. VM_CAP_BPT_EXIT). Reviewed by: jhb Sponsored by: Google, Inc. (GSoC 2022) Differential Revision: https://reviews.freebsd.org/D42405
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# b10e100d	05-May-2023	Corvin Köhne <corvink@FreeBSD.org>	vmm: don't free unallocated memory If vmx or svm is disabled in BIOS or the device isn't supported by vmm, modinit won't allocate these state save areas. As kmem_free panics when passing a NULL pointer to it, loading the vmm kernel module causes a panic too. PR: 271251 Reviewed by: markj Fixes: 74ac712f72cfd6d7b3db3c9d3b72ccf2824aa183 ("vmm: Dynamically allocate a couple of per-CPU state save areas") MFC after: 1 week Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D39974
# 74ac712f	26-Apr-2023	Mark Johnston <markj@FreeBSD.org>	vmm: Dynamically allocate a couple of per-CPU state save areas This avoids bloating the BSS when MAXCPU is large. No functional change intended. PR: 269572 Reviewed by: corvink, rew Tested by: rew MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D39805
# 8104fc31	28-Feb-2023	Vitaliy Gusev <gusev.vitaliy@gmail.com>	bhyve: fix restore of kernel structs vmx_snapshot() and svm_snapshot() do not save any data and error occurs at resume: Restoring kernel structs... vm_restore_kern_struct: Kernel struct size was 0 for: vmx Failed to restore kernel structs. Reviewed by: corvink, markj Fixes: 39ec056e6dbd89e26ee21d2928dbd37335de0ebc ("vmm: Rework snapshotting of CPU-specific per-vCPU data.") MFC after: 2 weeks Sponsored by: vStack Differential Revision: https://reviews.freebsd.org/D38476
# 892feec2	15-Nov-2022	Corvin Köhne <corvink@FreeBSD.org>	vmm: avoid spurious rendezvous A vcpu only checks if a rendezvous is in progress or not to decide if it should handle a rendezvous. This could lead to spurios rendezvous where a vcpu tries a handle a rendezvous it isn't part of. This situation is properly handled by vm_handle_rendezvous but it could potentially degrade the performance. Avoid that by an early check if the vcpu is part of the rendezvous or not. At the moment, rendezvous are only used to spin up application processors and to send ioapic interrupts. Spinning up application processors is done in the guest boot phase by sending INIT SIPI sequences to single vcpus. This is known to cause spurious rendezvous and only occurs in the boot phase. Sending ioapic interrupts is rare because modern guest will use msi and the rendezvous is always send to all vcpus. Reviewed by: jhb MFC after: 1 week Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D37390
# 80cb5d84	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37162
# d3956e46	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Use struct vcpu in the instruction emulation code. This passes struct vcpu down in place of struct vm and and integer vcpu index through the in-kernel instruction emulation code. To minimize userland disruption, helper macros are used for the vCPU arguments passed into and through the shared instruction emulation code. A few other APIs used by the instruction emulation code have also been updated to accept struct vcpu in the kernel including vm_get/set_register and vm_inject_fault. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37161
# 3dc3d32a	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Use struct vcpu with the vmm_stat API. The function callbacks still use struct vm and and vCPU index. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37157
# 950af9ff	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Expose struct vcpu as an opaque type. Pass a pointer to the current struct vcpu to the vcpu_init callback and save this pointer in the CPU-specific vcpu structures. Add routines to fetch a struct vcpu by index from a VM and to query the VM and vcpuid from a struct vcpu. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37156
# fca494da	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm svm: Add SVM_CTR* wrapper macros. These macros are similar to VCPU_CTR* but accept a single svm_vcpu pointer as the first argument instead of separate vm and vcpuid. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37153
# 869c8d19	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Remove the per-vm cookie argument from vmmops taking a vcpu. This requires storing a reference to the per-vm cookie in the CPU-specific vCPU structure. Take advantage of this new field to remove no-longer-needed function arguments in the CPU-specific backends. In particular, stop passing the per-vm cookie to functions that either don't use it or only use it for KTR traces. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37152
# 1aa51504	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Refactor storage of CPU-dependent per-vCPU data. Rather than storing static arrays of per-vCPU data in the CPU-specific per-VM structure, adopt a more dynamic model similar to that used to manage CPU-specific per-VM data. That is, add new vmmops methods to init and cleanup a single vCPU. The init method returns a pointer that is stored in 'struct vcpu' as a cookie pointer. This cookie pointer is now passed to other vmmops callbacks in place of the integer index. The index is now only used in KTR traces and when calling back into the CPU-independent layer. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37151
# 39ec056e	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Rework snapshotting of CPU-specific per-vCPU data. Previously some per-vCPU state was saved in vmmops_snapshot and other state was saved in vmmops_vcmx_snapshot. Consolidate all per-vCPU state into the latter routine and rename the hook to the more generic 'vcpu_snapshot'. Note that the CPU-independent per-vCPU data is still stored in a separate blob as well as the per-vCPU local APIC data. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37150
# 19b9dd2e	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm svm: Mark all VMCB state caches dirty on vCPU restore. Mark Johnston noticed that this was missing VMCB_CACHE_LBR. Just set all the bits as is done in svm_run() rather than trying to clear individual bits. Reported by: markj Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37259
# 215d2fd5	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm svm: Refactor per-vCPU data. - Allocate VMCBs separately to avoid excessive padding in struct svm_vcpu. - Allocate APIC pages dynamically directly in struct vlapic. - Move vm_mtrr into struct svm_vcpu rather than using a separate parallel array. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37148
# 35abc6c2	18-Nov-2022	John Baldwin <jhb@FreeBSD.org>	vmm: Use vm_get_maxcpus() instead of VM_MAXCPU in various places. Mostly these are loops that iterate over all possible vCPU IDs for a specific virtual machine. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37147
# 0bda8d3e	07-Sep-2022	Corvin Köhne <CorvinK@beckhoff.com>	vmm: permit some IPIs to be handled by userspace Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and STARTUP IPIs are now returned to userland. Due to backward compatibility reasons, a new capability is added for enabling VM_EXITCODE_IPI. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D35623 Sponsored by: Beckhoff Automation GmbH & Co. KG
# 3fc17484	09-Sep-2022	Emmanuel Vadot <manu@FreeBSD.org>	Revert "vmm: permit some IPIs to be handled by userspace" This reverts commit a5a918b7a906eaa88e0833eac70a15989d535b02. This cause some problem with vm using bhyveload. Reported by: pho, kp
# a5a918b7	07-Sep-2022	Corvin Köhne <CorvinK@beckhoff.com>	vmm: permit some IPIs to be handled by userspace Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and Startup IPIs are now returned to userland. Due to backward compatibility reasons, a new capability is added for enabling VM_EXITCODE_IPI. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35623 Sponsored by: Beckhoff Automation GmbH & Co. KG
# 4eadbef9	27-Jul-2022	Corvin Köhne <CorvinK@beckhoff.com>	vmm: emulate INVD by ignoring it On physical systems the ram isn't initialized on boot. So, coreboot uses the cache as ram in this boot phase. When exiting cache as ram, coreboot calls INVD for making the cache consistent. In a virtual environment ram is always initialized and the cache is always consistent. So, we can safely ignore this call. Reviewed by: jhb, imp Differential Revision: https://reviews.freebsd.org/D35620 Sponsored by: Beckhoff Automation GmbH & Co. KG
# 9aa02d51	30-Jun-2022	Mihai Burcea <mihaiburcea15@gmail.com>	vmm: Fix snapshots for AMD CPUs This patch fixes the AMD implementation for snapshotting. It removes unnecessary vmcb fields that should not be saved and duplicates. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33431
# 3ba952e1	30-May-2022	Corvin Köhne <CorvinK@beckhoff.com>	vmm: add tunable to trap WBINVD x86 is cache coherent. However, there are special cases where cache coherency isn't ensured (e.g. when switching the caching mode). In these cases, WBINVD can be used. WBINVD writes all cache lines back into main memory and invalidates the whole cache. Due to the invalidation of the whole cache, WBINVD is a very heavy instruction and degrades the performance on all cores. So, we should minimize the use of WBINVD as much as possible. In a virtual environment, the WBINVD call is mostly useless. The guest isn't able to break cache coherency because he can't switch the physical cache mode. When using pci passthrough WBINVD might be useful. Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For that reason, we implement it as tunable. Reviewed by: jhb Sponsored by: Beckhoff Automation GmbH & Co. KG MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D35253
# f877977a	10-Apr-2022	Robert Wing <rew@FreeBSD.org>	vmm: fix set but not used warnings
# b7924341	27-Aug-2021	Andrew Turner <andrew@FreeBSD.org>	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830
# 15add60d	27-Nov-2020	Peter Grehan <grehan@FreeBSD.org>	Convert vmm_ops calls to IFUNC There is no need for these to be function pointers since they are never modified post-module load. Rename AMD/Intel ops to be more consistent. Submitted by: adam_fenn.io Reviewed by: markj, grehan Approved by: grehan (bhyve) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27375
# 6f5a9606	11-Nov-2020	Mark Johnston <markj@FreeBSD.org>	vmm: Make pmap_invalidate_ept() wait synchronously for guest exits Currently EPT TLB invalidation is done by incrementing a generation counter and issuing an IPI to all CPUs currently running vCPU threads. The VMM inner loop caches the most recently observed generation on each host CPU and invalidates TLB entries before executing the VM if the cached generation number is not the most recent value. pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing guest instructions and reload the generation number. However, it does not actually wait for vCPUs to exit, potentially creating a window where guests may continue to reference stale TLB entries. Fix the problem by bracketing guest execution with an SMR read section which is entered before loading the invalidation generation. Then, pmap_invalidate_ept() increments the current write sequence before loading pm_active and sending IPIs, and polls readers to ensure that all vCPUs potentially operating with stale TLB entries have exited before pmap_invalidate_ept() returns. Also ensure that unsynchronized loads of the generation counter are wrapped with atomic(9), and stop (inconsistently) updating the invalidation counter and pm_active bitmask with acquire semantics. Reviewed by: grehan, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26910
# a3f2a9c5	01-Oct-2020	John Baldwin <jhb@FreeBSD.org>	Clear the upper 32-bits of registers in x86_emulate_cpuid(). Per the Intel manuals, CPUID is supposed to unconditionally zero the upper 32 bits of the involved (rax/rbx/rcx/rdx) registers. Previously, the emulation would cast pointers to the 64-bit register values down to `uint32_t`, which while properly manipulating the lower bits, would leave any garbage in the upper bits uncleared. While no existing guest OSes seem to stumble over this in practice, the bhyve emulation should match x86 expectations. This was discovered through alignment warnings emitted by gcc9, while testing it against SmartOS/bhyve. SmartOS bug: https://smartos.org/bugview/OS-8168 Submitted by: Patrick Mooney Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D24727
# 09860d44	15-Sep-2020	Ed Maste <emaste@FreeBSD.org>	bhyve: do not permit write access to VMCB / VMCS Reported by: Patrick Mooney Submitted by: jhb Security: CVE-2020-24718
# 101d5b52	15-Sep-2020	Konstantin Belousov <kib@FreeBSD.org>	bhyve: intercept AMD SVM instructions. Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an SVM instruction. Otherwise, SVM allows execution of them, and instructions operate on host physical addresses despite being executed in guest mode. Reported by: Maxime Villard <max@m00nbsd.net> admbug: 972 CVE: CVE-2020-7467 Reviewed by: grehan, markj Differential revision: https://reviews.freebsd.org/D26313
# 543769bf	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	amd64: clean up empty lines in .c and .h files
# 9ce875d9	23-Aug-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64 pmap: LA57 AKA 5-level paging Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273
# 483d953a	04-May-2020	John Baldwin <jhb@FreeBSD.org>	Initial support for bhyve save and restore. Save and restore (also known as suspend and resume) permits a snapshot to be taken of a guest's state that can later be resumed. In the current implementation, bhyve(8) creates a UNIX domain socket that is used by bhyvectl(8) to send a request to save a snapshot (and optionally exit after the snapshot has been taken). A snapshot currently consists of two files: the first holds a copy of guest RAM, and the second file holds other guest state such as vCPU register values and device model state. To resume a guest, bhyve(8) must be started with a matching pair of command line arguments to instantiate the same set of device models as well as a pointer to the saved snapshot. While the current implementation is useful for several uses cases, it has a few limitations. The file format for saving the guest state is tied to the ABI of internal bhyve structures and is not self-describing (in that it does not communicate the set of device models present in the system). In addition, the state saved for some device models closely matches the internal data structures which might prove a challenge for compatibility of snapshot files across a range of bhyve versions. The file format also does not currently support versioning of individual chunks of state. As a result, the current file format is not a fixed binary format and future revisions to save and restore will break binary compatiblity of snapshot files. The goal is to move to a more flexible format that adds versioning, etc. and at that point to commit to providing a reasonable level of compatibility. As a result, the current implementation is not enabled by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option for userland builds, and the kernel option BHYVE_SHAPSHOT. Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz Relnotes: yes Sponsored by: University Politehnica of Bucharest Sponsored by: Matthew Grooms (student scholarships) Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D19495
# b40598c5	15-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23625 X-Generally looks fine: jhb
# cbd03a9d	13-Dec-2019	John Baldwin <jhb@FreeBSD.org>	Support software breakpoints in the debug server on Intel CPUs. - Allow the userland hypervisor to intercept breakpoint exceptions (BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to enable this feature. These exceptions are reported to userland via a new VM_EXITCODE_BPT that includes the length of the original breakpoint instruction. If userland wishes to pass the exception through to the guest, it must be explicitly re-injected via vm_inject_exception(). - Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH pseudo-register. Injecting a BP# on Intel requires setting this to the length of the breakpoint instruction. AMD SVM currently ignores writes to this register (but reports success) and fails to read it. - Rework the per-vCPU state tracked by the debug server. Rather than a single 'stepping_vcpu' global, add a structure for each vCPU that tracks state about that vCPU ('stepping', 'stepped', and 'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is currently reporting an event. Event handlers for MTRAP and breakpoint exits loop until the associated event is reported to the debugger. Breakpoint events are discarded if the breakpoint is not present when a vCPU resumes in the breakpoint handler to retry submitting the breakpoint event. - Maintain a linked-list of active breakpoints in response to the GDB 'Z0' and 'z0' packets. Reviewed by: markj (earlier version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D20309
# e08087ee	28-Aug-2019	John Baldwin <jhb@FreeBSD.org>	Use get_pcpu() to fetch the current CPU's pcpu pointer. This avoids encoding knowledge about how pcpu objects are allocated and is also a few instructions shorter. MFC after: 2 weeks
# 13a7c4d4	07-Aug-2019	Mark Johnston <markj@FreeBSD.org>	Use designated initializers for vmm_ops. MFC after: 3 days
# a488c9c9	25-Apr-2019	Rodney W. Grimes <rgrimes@FreeBSD.org>	Add accessor function for vm->maxcpus Replace most VM_MAXCPU constant useses with an accessor function to vm->maxcpus which for now is initialized and kept at the value of VM_MAXCPUS. This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de) work from D10070 to adjust it for the cpu topology changes that occured in r332298 Submitted by: Fabian Freyer (fabian.freyer_physik.tu-berlin.de) Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Approved by: bde (mentor), jhb (maintainer) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18755
# de679f6e	15-Oct-2018	John Baldwin <jhb@FreeBSD.org>	Reload the LDT selector after an AMD-v #VMEXIT. cpu_switch() always reloads the LDT, so this can only affect the hypervisor process itself. Fix this by explicitly reloading the host LDT selector after each #VMEXIT. The stock bhyve process on FreeBSD never uses a custom LDT, so this change is cosmetic. Reviewed by: kib Tested by: Mike Tancsa <mike@sentex.net> Approved by: re (gjb) MFC after: 2 weeks
# ebc3c37c	13-Jun-2018	Marcelo Araujo <araujo@FreeBSD.org>	Add SPDX tags to vmm(4). MFC after: 4 weeks. Sponsored by: iXsystems Inc.
# 9e2154ff	21-May-2018	John Baldwin <jhb@FreeBSD.org>	Cleanups related to debug exceptions on x86. - Add constants for fields in DR6 and the reserved fields in DR7. Use these constants instead of magic numbers in most places that use DR6 and DR7. - Refer to T_TRCTRAP as "debug exception" rather than a "trace trap" as it is not just for trace exceptions. - Always read DR6 for debug exceptions and only clear TF in the flags register for user exceptions where DR6.BS is set. - Clear DR6 before returning from a debug exception handler as recommended by the SDM dating all the way back to the 386. This allows debuggers to determine the cause of each exception. For kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value to other parts of the handler (namely, user_dbreg_trap()). For user traps, wait until after trapsignal to clear DR6 so that userland debuggers can read DR6 via PT_GETDBREGS while the thread is stopped in trapsignal(). Reviewed by: kib, rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15189
# fc276d92	06-Apr-2018	John Baldwin <jhb@FreeBSD.org>	Add a way to temporarily suspend and resume virtual CPUs. This is used as part of implementing run control in bhyve's debug server. The hypervisor now maintains a set of "debugged" CPUs. Attempting to run a debugged CPU will fail to execute any guest instructions and will instead report a VM_EXITCODE_DEBUG exit to the userland hypervisor. Virtual CPUs are placed into the debugged state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl). Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl). The debug server suspends virtual CPUs when it wishes them to stop executing in the guest (for example, when a debugger attaches to the server). The debug server can choose to resume only a subset of CPUs (for example, when single stepping) or it can choose to resume all CPUs. The debug server must explicitly mark a CPU as resumed via vm_resume_cpu() before the virtual CPU will successfully execute any guest instructions. Reviewed by: avg, grehan Tested on: Intel (jhb), AMD (avg) Differential Revision: https://reviews.freebsd.org/D14466
# 7b394c10	16-Feb-2018	Andriy Gapon <avg@FreeBSD.org>	move vintr_intercept_enabled under INVARIANTS The function is not used outside of INVARIANTS since r328622. MFC after: 1 week
# 6a8b7aa4	31-Jan-2018	Andriy Gapon <avg@FreeBSD.org>	vmm/svm: post LAPIC interrupts using event injection, not virtual interrupts The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR fields of VMCB to inject a virtual interrupt into a guest VM. This method has many advantages over the direct event injection as it offloads all decisions of whether and when the interrupt can be delivered to the guest. But with a purely software emulated vAPIC the advantage is also a problem. The problem is that the hypervisor does not have any precise control over when the interrupt is actually delivered to the guest (or a notification about that). Because of that the hypervisor cannot update the interrupt vector in IRR and ISR in the same way as real hardware would. The hypervisor becomes aware that the interrupt is being serviced only upon the first VMEXIT after the interrupt is delivered. This creates a window between the actual interrupt delivery and the update of IRR and ISR. That means that IRR and ISR might not be correctly set up to the point of the end-of-interrupt signal. The described deviation has been observed to cause an interrupt loss in the following scenario. vCPU0 posts an inter-processor interrupt to vCPU1. The interrupt is injected as a virtual interrupt by the hypervisor. The interrupt is delivered to a guest and an interrupt handler is invoked. The handler performs a requested action and acknowledges the request by modifying a global variable. So far, there is no VMEXIT and the hypervisor is unaware of the events. Then, vCPU0 notices the acknowledgment and sends another IPI with the same vector. The IPI gets collapsed into the previous IPI in the IRR of vCPU1. Only after that a VMEXIT of vCPU1 occurs. At that time the vector is cleared in the IRR and is set in the ISR. vCPU1 has vAPIC state as if the second IPI has never been sent. The scenario is impossible on the real hardware because IRR and ISR are updated just before the interrupt handler gets started. I saw several possibilities of fixing the problem. One is to intercept the virtual interrupt delivery to update IRR and ISR at the right moment. The other is to deliver the LAPIC interrupts using the event injection, same as legacy interrupts. I opted to use the latter approach for several reasons. It's equivalent to what VMM/Intel does (in !VMX case). It appears to be what VirtualBox and KVM do. The code is already there (to support legacy interrupts). Another possibility was to use a special intermediate state for a vector after it is injected using a virtual interrupt and before it is known whether it was accepted or is still pending. That approach was implemented in https://reviews.freebsd.org/D13828 That method is more complex and does not have any clear advantage. Please see sections 15.20 and 15.21.4 of "AMD64 Architecture Programmer's Manual Volume 2: System Programming" (publication 24593, revision 3.29) for comparison between event injection and virtual interrupt injection. PR: 215972 Reported by: ajschot@hotmail.com, grehan Tested by: anish, grehan, Nils Beyer <nbe@renzel.net> Reviewed by: anish, grehan MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13780
# 65eefbe4	17-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Save and restore guest debug registers. Currently most of the debug registers are not saved and restored during VM transitions allowing guest and host debug register values to leak into the opposite context. One result is that hardware watchpoints do not work reliably within a guest under VT-x. Due to differences in SVM and VT-x, slightly different approaches are used. For VT-x: - Enable debug register save/restore for VM entry/exit in the VMCS for DR7 and MSR_DEBUGCTL. - Explicitly save DR0-3,6 of the guest. - Explicitly save DR0-3,6-7, MSR_DEBUGCTL, and the trap flag from %rflags for the host. Note that because DR6 is "software" managed and not stored in the VMCS a kernel debugger which single steps through VM entry could corrupt the guest DR6 (since a single step trap taken after loading the guest DR6 could alter the DR6 register). To avoid this, explicitly disable single-stepping via the trace flag before loading the guest DR6. A determined debugger could still defeat this by setting a breakpoint after the guest DR6 was loaded and then single-stepping. For SVM: - Enable debug register caching in the VMCB for DR6/DR7. - Explicitly save DR0-3 of the guest. - Explicitly save DR0-3,6-7, and MSR_DEBUGCTL for the host. Since SVM saves the guest DR6 in the VMCB, the race with single-stepping described for VT-x does not exist. For both platforms, expose all of the guest DRx values via --get-drX and --set-drX flags to bhyvectl. Discussed with: avg, grehan Tested by: avg (SVM), myself (VT-x) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D13229
# 091da2df	09-Jan-2018	Andriy Gapon <avg@FreeBSD.org>	vmm/svm: contigmalloc of the whole svm_softc is excessive This is a followup to r307903. struct svm_softc takes more than 200 kilobytes while what we really need is 3 contiguous pages for I/O permission map and 2 contiguous pages for MSR permission map. Other physically mapped structures have a size of a single page, so a proper alignment is sufficient for their correct mapping. Thus, only the permission maps are allocated with contigmalloc now, the softc is allocated with a regular malloc. Additionally, this commit adds a check that malloc returns memory with the expected page alignment and that contigmalloc does not fail. Unfortunately, at present svm_vminit() is expected to always succeed and there is no way to report an error. So, a contigmalloc failure leads to a panic. We should probably fix this. MFC after: 2 weeks
# 978f3da1	26-Mar-2017	Andriy Gapon <avg@FreeBSD.org>	revert r315959 because it causes build problems The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others
# a7b4c009	25-Mar-2017	Andriy Gapon <avg@FreeBSD.org>	specific end of interrupt implementation for AMD Local APIC The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880
# 2f4c4321	28-Oct-2016	Andriy Gapon <avg@FreeBSD.org>	fix a syntax error in r308039 ... that I somehow introduced between testing the change iand committing it. MFC after: 1 week X-MFC with: r307903
# 211029ce	28-Oct-2016	Andriy Gapon <avg@FreeBSD.org>	vmm: another take at maximmum address passed to contigmalloc Just using vm_paddr_t value with all bits set. That should work as long as the type is unsigned. While there, fix a couple of whitespace issues nearby. MFC after: 1 week X-MFC with: r307903
# 1ea77652	25-Oct-2016	Andriy Gapon <avg@FreeBSD.org>	fix up r307903, use correct max address definition MFC after: 1 week X-MFC with: r307903
# 3387e874	25-Oct-2016	Andriy Gapon <avg@FreeBSD.org>	vmm/svm: iopm_bitmap and msr_bitmap must be contiguous in physical memory To achieve that the whole svm_softc is allocated with contigmalloc now. It would be more effient to de-embed those arrays and allocate only them with contigmalloc. Previously, if malloc(9) used non-contiguous pages for the arrays, then random bits in physical pages next to the first page would be used to determine permissions for I/O port and MSR accesses. That could result in a guest dangerously modifying the host hardware configuration. One example is that sometimes NMI watchdog driver in a Linux guest would be able to configure a performance counter on a host system. The counter would generate an interrupt and if hwpmc(4) driver is loaded on the host, then the interrupt would be delivered as an NMI. Discussed with: jhb Reviewed by: grehan MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8321
# a1e1814d	22-Feb-2016	Svatopluk Kraus <skra@FreeBSD.org>	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373
# 90e528f8	22-Jun-2015	Neel Natu <neel@FreeBSD.org>	Restore the host's GS.base before returning from 'svm_launch()'. Previously this was done by the caller of 'svm_launch()' after it returned. This works fine as long as no code is executed in the interim that depends on pcpu data. The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because it calls 'dtrace_probe()' which in turn relies on pcpu data. Reported by: avg MFC after: 1 week
# 9b1aa8d6	18-Jun-2015	Neel Natu <neel@FreeBSD.org>	Restructure memory allocation in bhyve to support "devmem". devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks
# b14bd6ac	03-Jun-2015	Neel Natu <neel@FreeBSD.org>	Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC after: 1 week
# 248e6799	28-May-2015	Neel Natu <neel@FreeBSD.org>	Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks
# ea91ca92	05-May-2015	Neel Natu <neel@FreeBSD.org>	Do a proper emulation of guest writes to MSR_EFER. - Must-Be-Zero bits cannot be set. - EFER_LME and EFER_LMA should respect the long mode consistency checks. - EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities. - Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce segment limits in 64-bit mode. MFC after: 2 weeks
# dbec2c5c	22-Apr-2015	Marcelo Araujo <araujo@FreeBSD.org>	Missing break in switch case. Differential Revision: D2342 Reviewed by: neel
# 7c0b0b9a	16-Apr-2015	Neel Natu <neel@FreeBSD.org>	Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly. MFC after: 1 week
# e4f605ee	24-Mar-2015	Tycho Nightingale <tychon@FreeBSD.org>	When fetching an instruction in non-64bit mode, consider the value of the code segment base address. Also if an instruction doesn't support a mod R/M (modRM) byte, don't be concerned if the CPU is in real mode. Reviewed by: neel
# 7d69783a	02-Mar-2015	Neel Natu <neel@FreeBSD.org>	Fix warnings/errors when building vmm.ko with gcc: - fix warning about comparison of 'uint8_t v_tpr >= 0' always being true. - fix error triggered by an empty clobber list in the inline assembly for "clgi" and "stgi" - fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The gcc assembler does not like the explicit operand "%rax" while the clang assembler requires specifying the operand "%rax". Fix this by encoding the instructions using the ".byte" directive. Reported by: julian MFC after: 1 week
# e09ff171	23-Jan-2015	Neel Natu <neel@FreeBSD.org>	Add macro to identify AVIC capability (advanced virtual interrupt controller) in AMD processors. Submitted by: Dmitry Luhtionov (dmitryluhtionov@gmail.com)
# c9c75df4	13-Jan-2015	Neel Natu <neel@FreeBSD.org>	'struct vm_exception' was intended to be used only as the collateral for the VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week
# 2ce12423	06-Jan-2015	Neel Natu <neel@FreeBSD.org>	Clear blocking due to STI or MOV SS in the hypervisor when an instruction is emulated or when the vcpu incurs an exception. This matches the CPU behavior. Remove special case code in HLT processing that was clearing the interrupt shadow. This is now redundant because the interrupt shadow is always cleared when the vcpu is resumed after an instruction is emulated. Reported by: David Reed (david.reed@tidalscale.com) MFC after: 2 weeks
# cd86d363	30-Dec-2014	Neel Natu <neel@FreeBSD.org>	Initialize all fields of 'struct vm_exception exception' before passing it to vm_inject_exception(). This fixes the issue that 'exception.cpuid' is uninitialized when calling 'vm_inject_exception()'. However, in practice this change is a no-op because vm_inject_exception() does not use 'exception.cpuid' for anything. Reported by: Coverity Scan CID: 1261297 MFC after: 3 days
# 95474bc2	29-Dec-2014	Neel Natu <neel@FreeBSD.org>	Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on an AMD/SVM host. MFC after: 1 week
# b0538143	22-Dec-2014	Neel Natu <neel@FreeBSD.org>	Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions". To enable this feature set the tunable to "1" before loading vmm.ko. Tracing the guest exceptions can be useful when debugging guest triple faults. Note that there is a performance impact when exception tracing is enabled since every exception will now trigger a VM-exit. Also, handle machine check exceptions that happen during guest execution by vectoring to the host's machine check handler via "int $18". Discussed with: grehan MFC after: 2 weeks
# f1be09bd	27-Oct-2014	Peter Grehan <grehan@FreeBSD.org>	Remove bhyve SVM feature printf's now that they are available in the general CPU feature detection code. Reviewed by: neel
# b1cf7bb5	16-Oct-2014	Neel Natu <neel@FreeBSD.org>	Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.
# 8fe9436d	10-Oct-2014	Neel Natu <neel@FreeBSD.org>	Get rid of unused headers. Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static. Replace ERR() with KASSERT(). style(9) cleanup.
# 882a1f19	10-Oct-2014	Neel Natu <neel@FreeBSD.org>	Use a consistent style for messages emitted when the module is loaded.
# 30571674	26-Sep-2014	Neel Natu <neel@FreeBSD.org>	Simplify register state save and restore across a VMRUN: - Host registers are now stored on the stack instead of a per-cpu host context. - Host %FS and %GS selectors are not saved and restored across VMRUN. - Restoring the %FS/%GS selectors was futile anyways since that only updates the low 32 bits of base address in the hidden descriptor state. - GS.base is properly updated via the MSR_GSBASE on return from svm_launch(). - FS.base is not used while inside the kernel so it can be safely ignored. - Add function prologue/epilogue so svm_launch() can be traced with Dtrace's FBT entry/exit probes. They also serve to save/restore the host %rbp across VMRUN. Reviewed by: grehan Discussed with: Anish Gupta (akgupt3@gmail.com)
# af198d88	21-Sep-2014	Neel Natu <neel@FreeBSD.org>	Allow more VMCB fields to be cached: - CR2 - CR0, CR3, CR4 and EFER - GDT/IDT base/limit fields - CS/DS/ES/SS selector/base/limit/attrib fields The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'. Restructure the code such that the fields above are only modified in a single place. This makes it easy to invalidate the VMCB cache when any of these fields is modified.
# 6b844b87	16-Sep-2014	Neel Natu <neel@FreeBSD.org>	Rework vNMI injection. Keep track of NMI blocking by enabling the IRET intercept on a successful vNMI injection. The NMI blocking condition is cleared when the handler executes an IRET and traps back into the hypervisor. Don't inject NMI if the processor is in an interrupt shadow to preserve the atomic nature of "STI;HLT". Take advantage of this and artificially set the interrupt shadow to prevent NMI injection when restarting the "iret". Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan
# 5fb3bc71	15-Sep-2014	Neel Natu <neel@FreeBSD.org>	Minor cleanup. Get rid of unused 'svm_feature' from the softc. Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check in vmm.c against 'vm->active_cpus' before the AMD-specific code is called. Submitted by: Anish Gupta (akgupt3@gmail.com)
# 79ad53fb	15-Sep-2014	Neel Natu <neel@FreeBSD.org>	Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)
# bbadcde4	13-Sep-2014	Neel Natu <neel@FreeBSD.org>	Set the 'vmexit->inst_length' field properly depending on the type of the VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan
# 74accc31	13-Sep-2014	Neel Natu <neel@FreeBSD.org>	Bug fixes. - Don't enable the HLT intercept by default. It will be enabled by bhyve(8) if required. Prior to this change HLT exiting was always enabled making the "-H" option to bhyve(8) meaningless. - Recognize a VM exit triggered by a non-maskable interrupt. Prior to this change the exit would be punted to userspace and the virtual machine would terminate.
# fa7caa91	13-Sep-2014	Neel Natu <neel@FreeBSD.org>	style(9): insert an empty line if the function has no local variables Pointed out by: grehan
# c2a875f9	13-Sep-2014	Neel Natu <neel@FreeBSD.org>	AMD processors that have the SVM decode assist capability will store the instruction bytes in the VMCB on a nested page fault. This is useful because it saves having to walk the guest page tables to fetch the instruction. vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len' that map directly to 'vie->inst[]' and 'vie->num_valid'. The instruction emulation handler skips calling 'vmm_fetch_instruction()' if 'vie->num_valid' is non-zero. The use of this capability can be turned off by setting the sysctl/tunable 'hw.vmm.svm.disable_npf_assist' to '1'. Reviewed by: Anish Gupta (akgupt3@gmail.com) Discussed with: grehan
# 442a04ca	11-Sep-2014	Neel Natu <neel@FreeBSD.org>	style(9): indent the switch, don't indent the case, indent case body one tab.
# e441104d	10-Sep-2014	Neel Natu <neel@FreeBSD.org>	Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt window exiting. This simply involves setting V_IRQ and enabling the VINTR intercept. This instructs the CPU to trap back into the hypervisor as soon as an interrupt can be injected into the guest. The pending interrupt is then injected via the traditional event injection mechanism. Rework vcpu interrupt injection so that Linux guests now idle with host cpu utilization close to 0%. Reviewed by: Anish Gupta (earlier version) Discussed with: grehan
# 238b6cb7	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Allow intercepts and irq fields to be cached by the VMCB. Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated when intercepts are modified. Each intercept is identified as a (index,bitmask) tuple. For e.g., the VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR). The first 20 bytes in control area that are used to enable intercepts are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'. Modify svm_setcap() and svm_getcap() to use the new APIs. Discussed with: Anish Gupta (akgupt3@gmail.com)
# e5397c9f	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Move the VMCB initialization into svm.c in preparation for changes to the interrupt injection logic. Discussed with: Anish Gupta (akgupt3@gmail.com)
# 840b1a27	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Move the event injection function into svm.c and add KTR logging for every event injection. This in in preparation for changes to SVM guest interrupt injection. Discussed with: Anish Gupta (akgupt3@gmail.com)
# 2591ee3e	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Remove a bogus check that flagged an error if the guest %rip was zero. An AP begins execution with %rip set to 0 after a startup IPI. Discussed with: Anish Gupta (akgupt3@gmail.com)
# 5e467bd0	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Make the KTR tracepoints uniform and ensure that every VM-exit is logged. Discussed with: Anish Gupta (akgupt3@gmail.com)
# a2901ce7	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Allow guest read access to MSR_EFER without hypervisor intervention. Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.
# 501f03eb	09-Sep-2014	Neel Natu <neel@FreeBSD.org>	Remove gratuitous forward declarations. Remove tabs on empty lines.
# a2684814	06-Sep-2014	Neel Natu <neel@FreeBSD.org>	Do proper ASID management for guest vcpus. Prior to this change an ASID was hard allocated to a guest and shared by all its vcpus. The meant that the number of VMs that could be created was limited to the number of ASIDs supported by the CPU. It was also inefficient because it forced a TLB flush on every VMRUN. With this change the number of guests that can be created is independent of the number of available ASIDs. Also, the TLB is flushed only when a new ASID is allocated. Discussed with: grehan Reviewed by: Anish Gupta (akgupt3@gmail.com)
# 38658797	04-Sep-2014	Neel Natu <neel@FreeBSD.org>	Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is called just once when a vcpu is initialized. Discussed with: Anish Gupta (akgupt3@gmail.com)
# fea6bd5c	04-Sep-2014	Neel Natu <neel@FreeBSD.org>	Consolidate the code to restore the host TSS after a #VMEXIT into a single function restore_host_tss(). Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in the kernel. It will be restored on return to userspace. Discussed with: Anish Gupta (akgupt3@gmail.com)
# 48e8c213	24-Aug-2014	Neel Natu <neel@FreeBSD.org>	An exception is allowed to be injected even if the vcpu is in an interrupt shadow, so move the check for pending exception before bailing out due to an interrupt shadow. Change return type of 'vmcb_eventinject()' to a void and convert all error returns into KASSERTs. Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift before masking the result. Reviewed by: Anish Gupta (akgupt3@gmail.com)
# 4e98fc90	11-Jun-2014	Neel Natu <neel@FreeBSD.org>	Disable global interrupts early so all the software state maintained by bhyve is sampled "atomically". Any interrupts after this point will be held pending by the CPU until the guest starts executing and will immediately trigger a #VMEXIT. Reviewed by: Anish Gupta (akgupt3@gmail.com)
# 37871487	09-Jun-2014	Peter Grehan <grehan@FreeBSD.org>	Temporary fix for guest idle detection. Handle ExtINT injection for SVM. The HPET emulation will inject a legacy interrupt at startup, and if this isn't handled, will result in the HLT-exit code assuming there are outstanding ExtINTs and return without sleeping. svm_inj_interrupts() needs more changes to bring it up to date with the VT-x version: these are forthcoming. Reviewed by: neel
# 1cc0e0ee	07-Jun-2014	Peter Grehan <grehan@FreeBSD.org>	Allow the TSC MSR to be accessed directly from the guest.
# 0df5b8cb	05-Jun-2014	Peter Grehan <grehan@FreeBSD.org>	ins/outs support for SVM. Modelled on the Intel VT-x code. Remove CR2 save/restore - the guest restore/save is done in hardware, and there is no need to save/restore the host version (same as VT-x). Submitted by: neel (SVM segment descriptor 'P' bit code) Reviewed by: neel
# 8c1da7e6	03-Jun-2014	Peter Grehan <grehan@FreeBSD.org>	Use API call when VM is detected as suspended. This fixes the (harmless) error message on exit: vmexit_suspend: invalid reason 217645057 Reviewed by: neel, Anish Gupta (akgupt3@gmail.com)
# eee8190a	03-Jun-2014	Peter Grehan <grehan@FreeBSD.org>	Bring (almost) up-to-date with HEAD. - use the new virtual APIC page - update to current bhyve APIs Tested by Anish with multiple FreeBSD SMP VMs on a Phenom, and verified by myself with light FreeBSD VM testing on a Sempron 3850 APU. The issues reported with Linux guests are very likely to still be here, but this sync eliminates the skew between the project branch and CURRENT, and should help to determine the causes. Some follow-on commits will fix minor cosmetic issues. Submitted by: Anish Gupta (akgupt3@gmail.com)
# cde843b4	04-Feb-2014	Peter Grehan <grehan@FreeBSD.org>	Changes to the SVM code to bring it up to r259205 - Convert VMM_CTR to VCPU_CTR KTR macros - Special handling of halt, save rflags for VMM layer to emulate halt for vcpu(sleep to be awakened by interrupt or stop it) - Cleanup of RVI exit handling code Submitted by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan
# a0b78f09	18-Dec-2013	Peter Grehan <grehan@FreeBSD.org>	Enable memory overcommit for AMD processors. - No emulation of A/D bits is required since AMD-V RVI supports A/D bits. - Enable pmap PT_RVI support(w/o PAT) which is required for memory over-commit support. - Other minor fixes: * Make use of VMCB EXITINTINFO field. If a #VMEXIT happens while delivering an interrupt, EXITINTINFO has all the details that bhyve needs to inject the same interrupt. * SVM h/w decode assist code was incomplete - removed for now. * Some minor code clean-up (more coming). Submitted by: Anish Gupta (akgupt3@gmail.com)
# ab76fd58	21-Oct-2013	Neel Natu <neel@FreeBSD.org>	The ASID allocation in SVM is incorrect because it allocates a single ASID for all vcpus belonging to a guest. This means that when different vcpus belonging to the same guest are executing on the same host cpu there may be "leakage" in the mappings created by one vcpu to another. The proper fix for this is being worked on and will be committed shortly. In the meantime workaround this bug by flushing the guest TLB entries on every VM entry. Submitted by: Anish Gupta (akgupt3@gmail.com)
# 4599af43	15-Oct-2013	Peter Grehan <grehan@FreeBSD.org>	Fix SVM handling of ASTPENDING, which manifested as a hang on console output (due to a missing interrupt). SVM does exit processing and then handles ASTPENDING which overwrites the already handled SVM exit cause and corrupts virtual machine state. For example, if the SVM exit was due to an I/O port access but the main loop detected an ASTPENDING, the exit would be processed as ASTPENDING and leave the device (e.g. emulated UART) for that I/O port in bad state. Submitted by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan
# df5e6de3	22-Aug-2013	Peter Grehan <grehan@FreeBSD.org>	Add in last remaining files to get AMD-SVM operational. Submitted by: Anish Gupta (akgupt3@gmail.com)