History log of /freebsd-current/sys/amd64/include/vmm.h
Revision Date Author Comments
# 1eedb4e5 11-Apr-2024 Elyes Haouas <ehaouas@noos.fr>

vmm: Fix typo

Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885


# f493ea65 07-Feb-2024 Mark Johnston <markj@FreeBSD.org>

vmm: Expose more registers to VM_GET_REGISTER

In a follow-up revision the gdb stub will support sending an XML target
description to gdb, which lets us send additional registers, including
the ones added in this patch.

Reviewed by: jhb
MFC after: 1 month
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D43665


# 7c8f1631 20-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

vmm.h: remove dup declaration

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43139


# e3b4fe64 07-Dec-2023 Bojan Novković <bojan.novkovic@fer.hr>

vmm: implement single-stepping for AMD CPUs

This patch implements single-stepping for AMD CPUs using the RFLAGS.TF
single-stepping mechanism. The GDB stub requests single-stepping
using the VM_CAP_RFLAGS_TF capability. Setting this capability will
set the RFLAGS.TF bit on the selected vCPU, activate DB exception
intercepts, and activate POPF/PUSH instruction intercepts. The
resulting DB exception is then caught by the IDT_DB vmexit handler and
bounced to userland where it is processed by the GDB stub. This patch
also makes sure that the value of the TF bit is correctly updated and
that it is not erroneously propagated into memory. Stepping over PUSHF
will cause the vm_handle_db function to correct the pushed RFLAGS
value and stepping over POPF will update the shadowed TF bit copy.

Reviewed by: jhb
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42296


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# e17eca32 23-May-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Avoid embedding cpuset_t ioctl ABIs

Commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# fefac543 09-May-2023 Bojan Novković <bojan.novkovic@fer.hr>

bhyve: fix vCPU single-stepping on VMX

This patch fixes virtual machine single stepping on VMX hosts.

Currently, when using bhyve's gdb stub, each attempt at single-stepping
a vCPU lands in a timer interrupt. The current single-stepping mechanism
uses the Monitor Trap Flag feature to cause VMEXIT after a single
instruction is executed. Unfortunately, the SDM states that MTF causes
VMEXITs for the next instruction that gets executed, which is often not
what the person using the debugger expects. [1]

This patch adds a new VM capability that masks interrupts on a vCPU by
blocking interrupt injection and modifies the gdb stub to use the newly
added capability while single-stepping a vCPU.

[1] Intel SDM 26.5.2 Vol. 3C

Reviewed by: corvink, jbh
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39949


# 7d9ef309 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

libvmmapi: Add a struct vcpu and use it in most APIs.

This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124


# 8104fc31 28-Feb-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: fix restore of kernel structs

vmx_snapshot() and svm_snapshot() do not save any data and error occurs at
resume:

Restoring kernel structs...
vm_restore_kern_struct: Kernel struct size was 0 for: vmx
Failed to restore kernel structs.

Reviewed by: corvink, markj
Fixes: 39ec056e6dbd89e26ee21d2928dbd37335de0ebc ("vmm: Rework snapshotting of CPU-specific per-vCPU data.")
MFC after: 2 weeks
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D38476


# 892feec2 15-Nov-2022 Corvin Köhne <corvink@FreeBSD.org>

vmm: avoid spurious rendezvous

A vcpu only checks if a rendezvous is in progress or not to decide if it
should handle a rendezvous. This could lead to spurios rendezvous where
a vcpu tries a handle a rendezvous it isn't part of. This situation is
properly handled by vm_handle_rendezvous but it could potentially
degrade the performance. Avoid that by an early check if the vcpu is
part of the rendezvous or not.

At the moment, rendezvous are only used to spin up application
processors and to send ioapic interrupts. Spinning up application
processors is done in the guest boot phase by sending INIT SIPI
sequences to single vcpus. This is known to cause spurious rendezvous
and only occurs in the boot phase. Sending ioapic interrupts is rare
because modern guest will use msi and the rendezvous is always send to
all vcpus.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37390


# 1f6db5d6 30-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Remove stale comment for vm_rendezvous.

Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4e7, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523dd5.

Reported by: andrew


# ca6b48f0 18-Nov-2022 Mark Johnston <markj@FreeBSD.org>

vmm: Restore the correct vm_inject_*() prototypes

Fixes: 80cb5d845b8f ("vmm: Pass vcpu instead of vm and vcpuid...")
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D37443


# ee98f99d 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.

The default is now the number of physical CPUs in the system rather
than 16.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37175


# 98568a00 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Allocate vCPUs on first use of a vCPU.

Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use. This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use. A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use. However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs. To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37174


# c0f35dbf 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.

Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs. INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.

In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37173


# 67b69e76 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use an sx lock to protect the memory map.

Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU. This is now
replaced by a new per-VM sx lock. Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37172


# 3f0f4b15 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Lookup vcpu pointers in vmmdev_ioctl.

Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c. For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37168


# d8be3d52 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use struct vcpu in the rendezvous code.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37165


# 80cb5d84 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37162


# d3956e46 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use struct vcpu in the instruction emulation code.

This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37161


# 28b561ad 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Add vm_gpa_hold_global wrapper function.

This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally. Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37160


# 2b4fe856 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove unused vm and vcpu arguments from vm_copy routines.

The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37158


# 3dc3d32a 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use struct vcpu with the vmm_stat API.

The function callbacks still use struct vm and and vCPU index.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37157


# 950af9ff 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Expose struct vcpu as an opaque type.

Pass a pointer to the current struct vcpu to the vcpu_init callback
and save this pointer in the CPU-specific vcpu structures.

Add routines to fetch a struct vcpu by index from a VM and to query
the VM and vcpuid from a struct vcpu.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37156


# 869c8d19 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.

This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure. Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends. In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37152


# 1aa51504 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Refactor storage of CPU-dependent per-vCPU data.

Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.

That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer. This cookie pointer is now passed to other vmmops
callbacks in place of the integer index. The index is now only used
in KTR traces and when calling back into the CPU-independent layer.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37151


# 39ec056e 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Rework snapshotting of CPU-specific per-vCPU data.

Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot. Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'. Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37150


# 0bda8d3e 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

vmm: permit some IPIs to be handled by userspace

Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and STARTUP IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 3fc17484 09-Sep-2022 Emmanuel Vadot <manu@FreeBSD.org>

Revert "vmm: permit some IPIs to be handled by userspace"

This reverts commit a5a918b7a906eaa88e0833eac70a15989d535b02.

This cause some problem with vm using bhyveload.

Reported by: pho, kp


# a5a918b7 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

vmm: permit some IPIs to be handled by userspace

Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG


# c6d31b83 18-Jul-2022 Konstantin Belousov <kib@FreeBSD.org>

AST: rework

Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888


# 3ba952e1 30-May-2022 Corvin Köhne <CorvinK@beckhoff.com>

vmm: add tunable to trap WBINVD

x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.

Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.

In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.

Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.

Reviewed by: jhb
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35253


# f1d450dd 09-Mar-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove VM_MAXCPU from the userspace API/ABI.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34494


# f8a6ec2d 18-Mar-2021 D Scott Phillips <scottph@FreeBSD.org>

bhyve: support relocating fbuf and passthru data BARs

We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066


# 15add60d 27-Nov-2020 Peter Grehan <grehan@FreeBSD.org>

Convert vmm_ops calls to IFUNC

There is no need for these to be function pointers since they are
never modified post-module load.

Rename AMD/Intel ops to be more consistent.

Submitted by: adam_fenn.io
Reviewed by: markj, grehan
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27375


# 543769bf 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

amd64: clean up empty lines in .c and .h files


# f3eb12e4 23-Aug-2020 Konstantin Belousov <kib@FreeBSD.org>

Add bhyve support for LA57 guest mode.

Noted and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273


# f5f5f1e7 18-Aug-2020 Peter Grehan <grehan@FreeBSD.org>

Support guest rdtscp and rdpid instructions on Intel VT-x

Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts
that support the "enable RDTSCP" VM-execution control.

Submitted by: adam_fenn.io
Reported by: chuck
Reviewed by: chuck, grehan, jhb
Approved by: jhb (bhyve), grehan
MFC after: 3 weeks
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D26003


# 4daa95f8 24-Jun-2020 Conrad Meyer <cem@FreeBSD.org>

bhyve(8): For prototyping, reattempt decode in userspace

If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# cfdea69d 21-Apr-2020 Conrad Meyer <cem@FreeBSD.org>

vmm(4): Decode 3-byte VEX-prefixed instructions

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24462


# 497cb925 17-Apr-2020 Conrad Meyer <cem@FreeBSD.org>

vmm.h: Add ABI assertions and mark implicit holes

The static assertions were added (with size and offsets from gdb) and verified
with a build prior to marking the holes explicitly.

This is in preparation for a subsequent revision, pending in phabricator, that
makes use of some of these unused bits without impacting the ABI.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24461


# b837dadd 02-Jan-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: terminate waiting loops if thread suspension is requested.

PR: 242724
Reviewed by: markj
Reported and tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
(previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22881


# cbd03a9d 13-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Support software breakpoints in the debug server on Intel CPUs.

- Allow the userland hypervisor to intercept breakpoint exceptions
(BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to
enable this feature. These exceptions are reported to userland via
a new VM_EXITCODE_BPT that includes the length of the original
breakpoint instruction. If userland wishes to pass the exception
through to the guest, it must be explicitly re-injected via
vm_inject_exception().

- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
pseudo-register. Injecting a BP# on Intel requires setting this to
the length of the breakpoint instruction. AMD SVM currently ignores
writes to this register (but reports success) and fails to read it.

- Rework the per-vCPU state tracked by the debug server. Rather than
a single 'stepping_vcpu' global, add a structure for each vCPU that
tracks state about that vCPU ('stepping', 'stepped', and
'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is
currently reporting an event. Event handlers for MTRAP and
breakpoint exits loop until the associated event is reported to the
debugger.

Breakpoint events are discarded if the breakpoint is not present
when a vCPU resumes in the breakpoint handler to retry submitting
the breakpoint event.

- Maintain a linked-list of active breakpoints in response to the GDB
'Z0' and 'z0' packets.

Reviewed by: markj (earlier version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D20309


# 490d56c5 31-Jul-2019 Ed Maste <emaste@FreeBSD.org>

vmx: use C99 bool, not boolean_t

Bhyve's vmm is a self-contained modern component and thus a good
candidate for use of C99 types.

Reviewed by: jhb, kib, markj, Patrick Mooney
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21036


# 422a8a4d 12-Jul-2019 Scott Long <scottl@FreeBSD.org>

Tie the name limit of a VM to SPECNAMELEN from devfs instead of a
hard-coded value. Don't allocate space for it from the kernel stack.
Account for prefix, suffix, and separator space in the name. This
takes the effective length up to 229 bytes on 13-current, and 37 bytes
on 12-stable. 37 bytes is enough to hold a full GUID string.

PR: 234134
MFC after: 1 week
Differential Revision: http://reviews.freebsd.org/D20924


# a488c9c9 25-Apr-2019 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add accessor function for vm->maxcpus

Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.

This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298

Submitted by: Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde (mentor), jhb (maintainer)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18755


# 27d26457 27-Sep-2018 Andrew Turner <andrew@FreeBSD.org>

Handle a guest executing a vm instruction by trapping and raising an
undefined instruction exception. Previously we would exit the guest,
however an unprivileged user could execute these.

Found with: syzkaller
Reviewed by: araujo, tychon (previous version)
Approved by: re (kib)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17192


# 147d12a7 15-May-2018 Antoine Brodin <antoine@FreeBSD.org>

vmmdev: return EFAULT when trying to read beyond VM system memory max address

Currently, when using dd(1) to take a VM memory image, the capture never ends,
reading zeroes when it's beyond VM system memory max address.
Return EFAULT when trying to read beyond VM system memory max address.

Reviewed by: imp, grehan, anish
Approved by: grehan
Differential Revision: https://reviews.freebsd.org/D15156


# 6ac73777 13-Apr-2018 Tycho Nightingale <tychon@FreeBSD.org>

Add SDT probes to vmexit on Intel.

Submitted by: domagoj.stolfa_gmail.com
Reviewed by: grehan, tychon
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D14656


# 01d822d3 08-Apr-2018 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.

bhyvectl --get-cpu-topology option added.

Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930


# fc276d92 06-Apr-2018 John Baldwin <jhb@FreeBSD.org>

Add a way to temporarily suspend and resume virtual CPUs.

This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).

The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.

Reviewed by: avg, grehan
Tested on: Intel (jhb), AMD (avg)
Differential Revision: https://reviews.freebsd.org/D14466


# 490768e2 07-Mar-2018 Tycho Nightingale <tychon@FreeBSD.org>

Fix a lock recursion introduced in r327065.

Reported by: kmacy
Reviewed by: grehan, jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14548


# 65eefbe4 17-Jan-2018 John Baldwin <jhb@FreeBSD.org>

Save and restore guest debug registers.

Currently most of the debug registers are not saved and restored
during VM transitions allowing guest and host debug register values to
leak into the opposite context. One result is that hardware
watchpoints do not work reliably within a guest under VT-x.

Due to differences in SVM and VT-x, slightly different approaches are
used.

For VT-x:

- Enable debug register save/restore for VM entry/exit in the VMCS for
DR7 and MSR_DEBUGCTL.
- Explicitly save DR0-3,6 of the guest.
- Explicitly save DR0-3,6-7, MSR_DEBUGCTL, and the trap flag from
%rflags for the host. Note that because DR6 is "software" managed
and not stored in the VMCS a kernel debugger which single steps
through VM entry could corrupt the guest DR6 (since a single step
trap taken after loading the guest DR6 could alter the DR6
register). To avoid this, explicitly disable single-stepping via
the trace flag before loading the guest DR6. A determined debugger
could still defeat this by setting a breakpoint after the guest DR6
was loaded and then single-stepping.

For SVM:
- Enable debug register caching in the VMCB for DR6/DR7.
- Explicitly save DR0-3 of the guest.
- Explicitly save DR0-3,6-7, and MSR_DEBUGCTL for the host. Since SVM
saves the guest DR6 in the VMCB, the race with single-stepping
described for VT-x does not exist.

For both platforms, expose all of the guest DRx values via --get-drX
and --set-drX flags to bhyvectl.

Discussed with: avg, grehan
Tested by: avg (SVM), myself (VT-x)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D13229


# c49761dd 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/amd64: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# edafb5a3 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/amd64: Small spelling fixes.

No functional change.


# 9b1aa8d6 18-Jun-2015 Neel Natu <neel@FreeBSD.org>

Restructure memory allocation in bhyve to support "devmem".

devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks


# 248e6799 28-May-2015 Neel Natu <neel@FreeBSD.org>

Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.

MFC after: 2 weeks


# ede04033 06-May-2015 Neel Natu <neel@FreeBSD.org>

Check 'td_owepreempt' and yield the vcpu thread if it is set.

This is done explicitly because a vcpu thread can be in a critical section
for the entire time slice alloted to it. This in turn can delay the handling
of the 'td_owepreempt'.

Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D2430


# 9c4d5478 06-May-2015 Neel Natu <neel@FreeBSD.org>

Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428


# 8325ce5c 30-Apr-2015 Neel Natu <neel@FreeBSD.org>

Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after: 1 week


# e4f605ee 24-Mar-2015 Tycho Nightingale <tychon@FreeBSD.org>

When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

Also if an instruction doesn't support a mod R/M (modRM) byte, don't
be concerned if the CPU is in real mode.

Reviewed by: neel


# 75346353 18-Jan-2015 Neel Natu <neel@FreeBSD.org>

MOVS instruction emulation.

These instructions are emitted by 'bus_space_read_region()' when accessing
MMIO regions.

Since MOVS can be used with a repeat prefix start decoding the REPZ and
REPNZ prefixes. Also start decoding the segment override prefix since MOVS
allows overriding the source operand segment register.

Tested by: tychon
MFC after: 1 week


# c9c75df4 13-Jan-2015 Neel Natu <neel@FreeBSD.org>

'struct vm_exception' was intended to be used only as the collateral for the
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.

Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.

Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.

MFC after: 1 week


# 0dafa5cd 30-Dec-2014 Neel Natu <neel@FreeBSD.org>

Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.

The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks


# b0538143 22-Dec-2014 Neel Natu <neel@FreeBSD.org>

Allow ktr(4) tracing of all guest exceptions via the tunable
"hw.vmm.trace_guest_exceptions". To enable this feature set the tunable
to "1" before loading vmm.ko.

Tracing the guest exceptions can be useful when debugging guest triple faults.

Note that there is a performance impact when exception tracing is enabled
since every exception will now trigger a VM-exit.

Also, handle machine check exceptions that happen during guest execution
by vectoring to the host's machine check handler via "int $18".

Discussed with: grehan
MFC after: 2 weeks


# 160ef77a 25-Oct-2014 Neel Natu <neel@FreeBSD.org>

Move the ACPI PM timer emulation into vmm.ko.

This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).

Discussed with: grehan


# 65145c7f 06-Oct-2014 Neel Natu <neel@FreeBSD.org>

Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.

The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting
CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to
be present anyways.

Discussed with: grehan


# c3498942 19-Sep-2014 Neel Natu <neel@FreeBSD.org>

Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by: grehan


# bbadcde4 13-Sep-2014 Neel Natu <neel@FreeBSD.org>

Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# d1819632 12-Sep-2014 Neel Natu <neel@FreeBSD.org>

Optimize the common case of injecting an interrupt into a vcpu after a HLT
by explicitly moving it out of the interrupt shadow. The hypervisor is done
"executing" the HLT and by definition this moves the vcpu out of the
1-instruction interrupt shadow.

Prior to this change the interrupt would be held pending because the VMCS
guest-interruptibility-state would indicate that "blocking by STI" was in
effect. This resulted in an unnecessary round trip into the guest before
the pending interrupt could be injected.

Reviewed by: grehan


# 7f21538b 23-Aug-2014 Peter Grehan <grehan@FreeBSD.org>

Change __inline style to be consistent with FreeBSD usage,
and also fix gcc build (on STABLE, when MFCd).

PR: 192880
Reviewed by: neel
Reported by: ngie
MFC after: 1 day


# f008d157 25-Jul-2014 Neel Natu <neel@FreeBSD.org>

If a vcpu has issued a HLT instruction with interrupts disabled then it sleeps
forever in vm_handle_hlt().

This is usually not an issue as long as one of the other vcpus properly resets
or powers off the virtual machine. However, if the bhyve(8) process is killed
with a signal the halted vcpu cannot be woken up because it's sleep cannot be
interrupted.

Fix this by waking up periodically and returning from vm_handle_hlt() if
TDF_ASTPENDING is set.

Reported by: Leon Dang
Sponsored by: Nahanni Systems


# d37f2adb 23-Jul-2014 Neel Natu <neel@FreeBSD.org>

Fix fault injection in bhyve.

The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.


# d665d229 22-Jul-2014 Neel Natu <neel@FreeBSD.org>

Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m


# 091d4532 19-Jul-2014 Neel Natu <neel@FreeBSD.org>

Handle nested exceptions in bhyve.

A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.


# 3d5444c8 16-Jul-2014 Neel Natu <neel@FreeBSD.org>

Add emulation for legacy x86 task switching mechanism.

FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by: glebius


# f7a9f178 15-Jul-2014 Neel Natu <neel@FreeBSD.org>

Add support for operand size and address size override prefixes in bhyve's
instruction emulation [1].

Fix bug in emulation of opcode 0x8A where the destination is a legacy high
byte register and the guest vcpu is in 32-bit mode. Prior to this change
instead of modifying %ah, %bh, %ch or %dh the emulation would end up
modifying %spl, %bpl, %sil or %dil instead.

Add support for moffsets by treating it as a 2, 4 or 8 byte immediate value
during instruction decoding.

Fix bug in verify_gla() where the linear address computed after decoding
the instruction was not being truncated to the effective address size [2].

Tested by: Leon Dang [1]
Reported by: Peter Grehan [2]
Sponsored by: Nahanni Systems


# b301b9e2 08-Jul-2014 Neel Natu <neel@FreeBSD.org>

Accurately identify the vcpu's operating mode as 64-bit, compatibility,
protected or real.


# 5ebc578b 10-Jun-2014 Tycho Nightingale <tychon@FreeBSD.org>

Replace enum forward declarations with complete definitions.

Reviewed by: neel


# 40487465 10-Jun-2014 Neel Natu <neel@FreeBSD.org>

Add helper functions to populate VM exit information for rendezvous and
astpending exits. This is to reduce code duplication between VT-x and
SVM implementations.


# 5fcf252f 07-Jun-2014 Neel Natu <neel@FreeBSD.org>

Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained
by vmm.ko. This allows the virtual machine to be restarted without having
to destroy it first.

Reviewed by: grehan


# 95ebc360 31-May-2014 Neel Natu <neel@FreeBSD.org>

Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.


# 65ffa035 26-May-2014 Neel Natu <neel@FreeBSD.org>

Add segment protection and limits violation checks in vie_calculate_gla()
for 32-bit x86 guests.

Tested using ins/outs executed in a FreeBSD/i386 guest.


# 5382c19d 24-May-2014 Neel Natu <neel@FreeBSD.org>

Do the linear address calculation for the ins/outs emulation using a new
API function 'vie_calculate_gla()'.

While the current implementation is simplistic it forms the basis of doing
segmentation checks if the guest is in 32-bit protected mode.


# da11f4aa 24-May-2014 Neel Natu <neel@FreeBSD.org>

Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).


# e813a873 24-May-2014 Neel Natu <neel@FreeBSD.org>

Consolidate all the information needed by the guest page table walker into
'struct vm_guest_paging'.

Check for canonical addressing in vmm_gla2gpa() and inject a protection
fault into the guest if a violation is detected.

If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to
point to the root of the page tables.


# 37a723a5 24-May-2014 Neel Natu <neel@FreeBSD.org>

When injecting a page fault into the guest also update the guest's %cr2 to
indicate the faulting linear address.

If the guest PML4 entry has the PG_PS bit set then inject a page fault into
the guest with the PGEX_RSV bit set in the error_code.

Get rid of redundant checks for the PG_RW violations when walking the page
tables.


# d17b5104 22-May-2014 Neel Natu <neel@FreeBSD.org>

Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by: grehan
Reviewed by: tychon (partially and a much earlier version)


# fd949af6 21-May-2014 Neel Natu <neel@FreeBSD.org>

Inject page fault into the guest if the page table walker detects an invalid
translation for the guest linear address.


# e4c8a13d 18-May-2014 Neel Natu <neel@FreeBSD.org>

Add PG_U (user/supervisor) checks when translating a guest linear address
to a guest physical address.

PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now
checked only in non-terminal paging entries.

Ignore the upper 32-bits of the CR3 for PAE paging.


# b3e9732a 15-May-2014 John Baldwin <jhb@FreeBSD.org>

Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.

Reviewed by: tycho


# e50ce2aa 01-May-2014 Neel Natu <neel@FreeBSD.org>

Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with: grehan@
Reviewed by: jhb@, tychon@


# c6a0cc2e 29-Apr-2014 Neel Natu <neel@FreeBSD.org>

Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by: grehan@


# f0fdcfe2 28-Apr-2014 Neel Natu <neel@FreeBSD.org>

Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with: grehan@


# b15a09c0 26-Mar-2014 Neel Natu <neel@FreeBSD.org>

Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with: grehan


# e883c9bb 25-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Move the atpit device model from userspace into vmm.ko for better
precision and lower latency.

Approved by: grehan (co-mentor)


# 0775fbb4 15-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Fix a race wherein the source of an interrupt vector is wrongly
attributed if an ExtINT arrives during interrupt injection.

Also, fix a spurious interrupt if the PIC tries to raise an interrupt
before the outstanding one is accepted.

Finally, improve the PIC interrupt latency when another interrupt is
raised immediately after the outstanding one is accepted by creating a
vmexit rather than waiting for one to occur by happenstance.

Approved by: neel (co-mentor)


# 762fd208 11-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Replace the userspace atpic stub with a more functional vmm.ko model.

New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by: jhb, neel
Approved by: neel (co-mentor)


# dc506506 25-Feb-2014 Neel Natu <neel@FreeBSD.org>

Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with: grehan, jhb
Reviewed by: jhb


# 52e5c8a2 19-Feb-2014 Neel Natu <neel@FreeBSD.org>

Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with: grehan@


# 00f3efe1 04-Feb-2014 John Baldwin <jhb@FreeBSD.org>

Add support for FreeBSD/i386 guests under bhyve.
- Similar to the hack for bootinfo32.c in userboot, define
_MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot.
This allows userboot to load 32-bit kernels and modules.
- Copy the SMAP generation code out of bootinfo64.c and into its own
file so it can be shared with bootinfo32.c to pass an SMAP to the i386
kernel.
- Use uint32_t instead of u_long when aligning module metadata in
bootinfo32.c in userboot, as otherwise the metadata used 64-bit
alignment which corrupted the layout.
- Populate the basemem and extmem members of the bootinfo struct passed
to 32-bit kernels.
- Fix the 32-bit stack in userboot to start at the top of the stack
instead of the bottom so that there is room to grow before the
kernel switches to its own stack.
- Push a fake return address onto the 32-bit stack in addition to the
arguments normally passed to exec() in the loader. This return
address is needed to convince recover_bootinfo() in the 32-bit
locore code that it is being invoked from a "new" boot block.
- Add a routine to libvmmapi to setup a 32-bit flat mode register state
including a GDT and TSS that is able to start the i386 kernel and
update bhyveload to use it when booting an i386 kernel.
- Use the guest register state to determine the CPU's current instruction
mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long
mode) in the instruction emulation code. Update the gla2gpa() routine
used when fetching instructions to handle flat mode, 32-bit paging, and
PAE paging in addition to long mode paging. Don't look for a REX
prefix when the CPU is in 32-bit mode, and use the detected mode to
enable the existing 32-bit mode code when decoding the mod r/m byte.

Reviewed by: grehan, neel
MFC after: 1 month


# 30b94db8 25-Jan-2014 Neel Natu <neel@FreeBSD.org>

Support level triggered interrupts with VT-x virtual interrupt delivery.

The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector.
If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger
an EOI-induced VM-exit when it is doing EOI virtualization.

The EOI-induced VM-exit results in the EOI being forwarded to the vioapic
so that level triggered interrupts can be properly handled.

Tested by: Anish Gupta (akgupt3@gmail.com)


# 5b8a8cd1 13-Jan-2014 Neel Natu <neel@FreeBSD.org>

Add an API to rendezvous all active vcpus in a virtual machine. The rendezvous
can be initiated in the context of a vcpu thread or from the bhyve(8) control
process.

The first use of this functionality is to update the vlapic trigger-mode
register when the IOAPIC pin configuration is changed.

Prior to this change we would update the TMR in the virtual-APIC page at
the time of interrupt delivery. But this doesn't work with Posted Interrupts
because there is no way to program the EOI_exit_bitmap[] in the VMCS of
the target at the time of interrupt delivery.

Discussed with: grehan@


# add611fd 08-Jan-2014 Neel Natu <neel@FreeBSD.org>

Don't expose 'vmm_ipinum' as a global.


# 0492757c 01-Jan-2014 Neel Natu <neel@FreeBSD.org>

Restructure the VMX code to enter and exit the guest. In large part this change
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.

Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.

Suggested by: grehan@


# de5ea6b6 24-Dec-2013 Neel Natu <neel@FreeBSD.org>

vlapic code restructuring to make it easy to support hardware-assist for APIC
emulation.

The vlapic initialization and cleanup is done via processor specific vmm_ops.
This will allow the VT-x/SVM modules to layer any hardware-assist for APIC
emulation or virtual interrupt delivery on top of the vlapic device model.

Add a parameter to 'vcpu_notify_event()' to distinguish between vlapic
interrupts versus other events (e.g. NMI). This provides an opportunity to
use hardware-assists like Posted Interrupts (VT-x) or doorbell MSR (SVM)
to deliver an interrupt to a guest without causing a VM-exit.

Get rid of lapic_pending_intr() and lapic_intr_accepted() and use the
vlapic_xxx() counterparts directly.

Associate an 'Apic Page' with each vcpu and reference it from the 'vlapic'.
The 'Apic Page' is intended to be referenced from the Intel VMCS as the
'virtual APIC page' or from the AMD VMCB as the 'vAPIC backing page'.


# 63e62d39 23-Dec-2013 John Baldwin <jhb@FreeBSD.org>

Add a resume hook for bhyve that runs a function on all CPUs during
resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode
at the time of suspend.

Reviewed by: neel


# f80330a8 22-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE
state before the requested state transition. This guarantees that there is
exactly one ioctl() operating on a vcpu at any point in time and prevents
unintended state transitions.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html

Reviewed by: grehan
Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com)
MFC after: 3 days


# 1c052192 07-Dec-2013 Neel Natu <neel@FreeBSD.org>

If a vcpu disables its local apic and then executes a 'HLT' then spin down the
vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore
pending interrupts in the IRR if interrupts have been disabled by the guest.
The interrupt cannot be injected into the guest in any case so resuming it
is futile.

With this change "halt" from a Linux guest works correctly.

Reviewed by: grehan@
Tested by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 7a3c80aa 02-Dec-2013 Neel Natu <neel@FreeBSD.org>

The 'protection' field in the VM exit collateral for the PAGING exit is not
used - get rid of it.


# 22821874 02-Dec-2013 Neel Natu <neel@FreeBSD.org>

Rename 'vm_interrupt_hostcpu()' to 'vcpu_notify_event()' because the function
has outgrown its original name. Originally this function simply sent an IPI
to the host cpu that a vcpu was executing on but now it does a lot more than
just that.

Reviewed by: grehan@


# 08e3ff32 25-Nov-2013 Neel Natu <neel@FreeBSD.org>

Add HPET device emulation to bhyve.

bhyve supports a single timer block with 8 timers. The timers are all 32-bit
and capable of being operated in periodic mode. All timers support interrupt
delivery using MSI. Timers 0 and 1 also support legacy interrupt routing.

At the moment the timers are not connected to any ioapic pins but that will
be addressed in a subsequent commit.

This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).


# 565bbb86 12-Nov-2013 Neel Natu <neel@FreeBSD.org>

Move the ioapic device model from userspace into vmm.ko. This is needed for
upcoming in-kernel device emulations like the HPET.

The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.

Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 49cc03da 16-Oct-2013 Neel Natu <neel@FreeBSD.org>

Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by: grehan
MFC after: 3 days


# 318224bb 05-Oct-2013 Neel Natu <neel@FreeBSD.org>

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


# 8d39ed16 09-Sep-2013 Peter Grehan <grehan@FreeBSD.org>

Go way past 11 and bump bhyve's max vCPUs to 16.

This should be sufficient for 10.0 and will do
until forthcoming work to avoid limitations
in this area is complete.

Thanks to Bela Lubkin at tidalscale for the
headsup on the apic/cpu id/io apic ASL parameters
that are actually hex values and broke when
written as decimal when 11 vCPUs were configured.

Approved by: re@


# d3c11f40 24-Apr-2013 Peter Grehan <grehan@FreeBSD.org>

Add RIP-relative addressing to the instruction decoder.
Rework the guest register fetch code to allow the RIP to
be extracted from the VMCS while the kernel decoder is
functioning.

Hit by the OpenBSD local-apic code.

Submitted by: neel
Reviewed by: grehan
Obtained from: NetApp


# d5408b1d 11-Apr-2013 Neel Natu <neel@FreeBSD.org>

If vmm.ko could not be initialized correctly then prevent the creation of
virtual machines subsequently.

Submitted by: Chris Torek


# 485b3300 11-Feb-2013 Neel Natu <neel@FreeBSD.org>

Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'.

Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING)
that called 'sched_bind()' on behalf of the user thread.

The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn
runs afoul of the assertion '(td_pinned == 0)' in userret().

Using the cpuset affinity to implement pinning of the vcpu threads works with
both 4BSD and ULE schedulers and has the happy side-effect of getting rid
of a bunch of code in vmm.ko.

Discussed with: grehan


# 912a3e67 19-Jan-2013 Neel Natu <neel@FreeBSD.org>

Add svn properties to the recently merged bhyve source files.

The pre-commit hook will not allow any commits without the svn:keywords
property in head.


# 48a29f4e 28-Nov-2012 Neel Natu <neel@FreeBSD.org>

Cleanup the user-space paging exit handler now that the unified instruction
emulation is in place.

Obtained from: NetApp


# ba9b7bf7 27-Nov-2012 Neel Natu <neel@FreeBSD.org>

Revamp the x86 instruction emulation in bhyve.

On a nested page table fault the hypervisor will:
- fetch the instruction using the guest %rip and %cr3
- decode the instruction in 'struct vie'
- emulate the instruction in host kernel context for local apic accesses
- any other type of mmio access is punted up to user-space (e.g. ioapic)

The decoded instruction is passed as collateral to the user-space process
that is handling the PAGING exit.

The emulation code is fleshed out to include more addressing modes (e.g. SIB)
and more types of operands (e.g. imm8). The source code is unified into a
single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well
as /usr/sbin/bhyve.

Reviewed by: grehan
Obtained from: NetApp


# f352ff0c 23-Oct-2012 Neel Natu <neel@FreeBSD.org>

Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner.
Also add a stats counter to count the number of NMIs delivered per vcpu.

Obtained from: NetApp


# 13ec9371 12-Oct-2012 Peter Grehan <grehan@FreeBSD.org>

Add the guest physical address and r/w/x bits to
the paging exit in preparation for a rework of
bhyve MMIO handling.

Reviewed by: neel
Obtained from: NetApp


# 75dd3366 12-Oct-2012 Neel Natu <neel@FreeBSD.org>

Provide per-vcpu locks instead of relying on a single big lock.

This also gets rid of all the witness.watch warnings related to calling
malloc(M_WAITOK) while holding a mutex.

Reviewed by: grehan


# bda273f2 02-Oct-2012 Neel Natu <neel@FreeBSD.org>

Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested
page tables.


# 341f19c9 28-Sep-2012 Neel Natu <neel@FreeBSD.org>

Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether
or not the address range had already been allocated.

Replace this instead with an explicit API 'vm_gpa_available()' that returns
TRUE if a page is available for allocation in guest physical address space.


# e9027382 25-Sep-2012 Neel Natu <neel@FreeBSD.org>

Add ioctls to control the X2APIC capability exposed by the virtual machine to
the guest.

At the moment this simply sets the state in the 'vcpu' instance but there is
no code that acts upon these settings.


# edf89256 24-Sep-2012 Neel Natu <neel@FreeBSD.org>

Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an
AP needs to be activated by spinning up an execution context for it.

The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.

Reviewed by: grehan


# 98ed632c 24-Sep-2012 Neel Natu <neel@FreeBSD.org>

Stash the 'vm_exit' information in each 'struct vcpu'.

There is no functional change at this time but this paves the way for vm exit
handler functions to easily modify the exit reason going forward.


# cd942e0f 28-Apr-2012 Peter Grehan <grehan@FreeBSD.org>

MSI-x interrupt support for PCI pass-thru devices.

Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.

Submitted by: ryan dot berryhill at sandvine dot com
Reviewed by: grehan
Obtained from: Sandvine


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski