History log of /freebsd-current/sys/amd64/vmm/vmm_dev.c
Revision Date Author Comments
# e3333648 07-May-2024 Mark Johnston <markj@FreeBSD.org>

vmm: Start reconciling amd64 and arm64 copies of vmm_dev.c

Most of the code in vmm_dev.c and vmm.c can and should be shared between
amd64 and arm64 (and eventually riscv) rather than being duplicated. To
the end of adding a shared implementation in sys/dev/vmm, this patch
eliminates most of the differences between the two copies of vmm_dev.c.

- Remove an unneeded cdefs.h include.
- Simplify the amd64 implementation of vcpu_unlock_one().
- Simplify the arm64 implementation of vcpu_lock_one().
- Pass buffer sizes to alloc_memseg() and get_memseg() on arm64. On
amd64 this is needed for compat ioctls, but these functions should be
merged.
- Make devmem_mmap_single() stricter on arm64.

Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D44995


# 6adf554a 25-Dec-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Fix handling of errors from subyte()

subyte() returns -1 upon an error, not an errno value.

MFC after: 1 week
Fixes: e17eca327633 ("vmm: Avoid embedding cpuset_t ioctl ABIs")


# 5635d5b6 17-Aug-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Fix VM_GET_CPUS compatibility

bhyve in a 13.x jail fails to boot guests with more than one vCPU
because they pass too small a buffer to VM_GET_CPUS, causing the ioctl
handler to return ERANGE. Handle this the same way as cpuset system
calls: make sure that the result can fit in the truncated space, and
relax the check on the cpuset buffer.

As a side effect, fix an insufficient bounds check on "size". The
signed/unsigned comparison with sizeof(cpuset_t) fails to exclude
negative values, so we can end up allocating impossibly large amounts of
memory.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41496


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 0d1ff2b0 12-Jul-2023 Gleb Smirnoff <glebius@FreeBSD.org>

vmm: don't leak locks exiting vmmdev_ioctl()

At least an error from vcpu_lock_all() at line 553 would leak
memseg lock. There might be other cases as well.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D40981


# 30f0328a 12-Jul-2023 Gleb Smirnoff <glebius@FreeBSD.org>

vmm: don't return random error from vcpu_lock_all() if vcpu is empty

When vcpu array is empty, function would return random value from
stack. What I observed was -1.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D40980


# e17eca32 23-May-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Avoid embedding cpuset_t ioctl ABIs

Commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 0f735657 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx member from struct vm_snapshot_meta.

This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel. For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument. As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38125


# ccf32a68 20-Jan-2023 Robert Wing <rew@FreeBSD.org>

vmm: take exclusive mem_segs_lock when (un)assigning ppt dev

PR: 268744
Reported by: mmatalka@gmail.com
Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8ee ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D37962


# d212d6eb 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Avoid infinite loop in vcpu_lock_all error case.

Reported by: Coverity (CIDs 1501060,1501071)
Reviewed by: corvink, markj, emaste
Differential Revision: https://reviews.freebsd.org/D37648


# 91980db1 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Don't lock a vCPU for VM_PPTDEV_MSI[X].

These are manipulating state in a ppt(4) device none of which is
vCPU-specific. Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37639


# 62be9ffd 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

vmm: VM_GET/SET_KERNEMU_DEV should run with the vCPU locked.

Reviewed by: corvink, kib, markj
Differential Revision: https://reviews.freebsd.org/D37638


# 98568a00 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Allocate vCPUs on first use of a vCPU.

Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use. This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use. A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use. However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs. To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37174


# 223de44c 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm devmem_mmap_single: Bump object reference under memsegs lock.

Reported by: markj
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37273


# 67b69e76 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use an sx lock to protect the memory map.

Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU. This is now
replaced by a new per-VM sx lock. Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37172


# 08ebb360 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Destroy mutexes.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37171


# 3f0f4b15 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Lookup vcpu pointers in vmmdev_ioctl.

Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c. For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37168


# 0cbc39d5 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm ppt: Remove unused vcpu arg from MSI setup handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37167


# 80cb5d84 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37162


# d3956e46 18-Nov-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Use struct vcpu in the instruction emulation code.

This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37161


# 7a0c23da 20-May-2022 John Baldwin <jhb@FreeBSD.org>

vmm: Destroy character devices synchronously.

This fixes a userland race where bhyveload or bhyve can fail to reuse
a VM name after bhyvectl --destroy has returned.

Reported by: Michael Dexter
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D35186


# 3587bfa7 10-Apr-2022 Robert Wing <rew@FreeBSD.org>

vmm: fix set but not used warning


# 64269786 07-Feb-2022 John Baldwin <jhb@FreeBSD.org>

Extend the VMM stats interface to support a dynamic count of statistics.

- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.

- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27463


# a8540490 18-Aug-2021 Cyril Zhang <cyril@freebsdfoundation.org>

vmm: Add credential to cdev object

Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.

Add regression tests.

Reviewed by: jhb, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31156


# f8a6ec2d 18-Mar-2021 D Scott Phillips <scottph@FreeBSD.org>

bhyve: support relocating fbuf and passthru data BARs

We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066


# 1925586e 24-Nov-2020 John Baldwin <jhb@FreeBSD.org>

Honor the disabled setting for MSI-X interrupts for passthrough devices.

Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212


# f4ce0629 20-May-2020 Conrad Meyer <cem@FreeBSD.org>

vmm(4): Add 12 user ABI compat after r349948

Reported by: kp
Reviewed by: jhb, kp
Tested by: kp
Differential Revision: https://reviews.freebsd.org/D24929


# 8a68ae80 15-May-2020 Conrad Meyer <cem@FreeBSD.org>

vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace

Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).

Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# b40598c5 15-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23625
X-Generally looks fine: jhb


# b837dadd 02-Jan-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: terminate waiting loops if thread suspension is requested.

PR: 242724
Reviewed by: markj
Reported and tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
(previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22881


# 026e4502 12-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix syntax.

Nod from: jhb
Sponsored by: The FreeBSD Foundation


# 422a8a4d 12-Jul-2019 Scott Long <scottl@FreeBSD.org>

Tie the name limit of a VM to SPECNAMELEN from devfs instead of a
hard-coded value. Don't allocate space for it from the kernel stack.
Account for prefix, suffix, and separator space in the name. This
takes the effective length up to 229 bytes on 13-current, and 37 bytes
on 12-stable. 37 bytes is enough to hold a full GUID string.

PR: 234134
MFC after: 1 week
Differential Revision: http://reviews.freebsd.org/D20924


# a488c9c9 25-Apr-2019 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add accessor function for vm->maxcpus

Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.

This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298

Submitted by: Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde (mentor), jhb (maintainer)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18755


# be963bee 31-Jul-2018 Marcelo Araujo <araujo@FreeBSD.org>

- Add the ability to run bhyve(8) within a jail(8).

This patch adds a new sysctl(8) knob "security.jail.vmm_allowed",
by default this option is disable.

Submitted by: Shawn Webb <shawn.webb____hardenedbsd.org>
Reviewed by: jamie@ and myself.
Relnotes: Yes.
Sponsored by: HardenedBSD and G2, Inc.
Differential Revision: https://reviews.freebsd.org/D16057


# 147d12a7 15-May-2018 Antoine Brodin <antoine@FreeBSD.org>

vmmdev: return EFAULT when trying to read beyond VM system memory max address

Currently, when using dd(1) to take a VM memory image, the capture never ends,
reading zeroes when it's beyond VM system memory max address.
Return EFAULT when trying to read beyond VM system memory max address.

Reviewed by: imp, grehan, anish
Approved by: grehan
Differential Revision: https://reviews.freebsd.org/D15156


# 01d822d3 08-Apr-2018 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.

bhyvectl --get-cpu-topology option added.

Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930


# fc276d92 06-Apr-2018 John Baldwin <jhb@FreeBSD.org>

Add a way to temporarily suspend and resume virtual CPUs.

This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).

The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.

Reviewed by: avg, grehan
Tested on: Intel (jhb), AMD (avg)
Differential Revision: https://reviews.freebsd.org/D14466


# 5f8754c0 26-Feb-2018 John Baldwin <jhb@FreeBSD.org>

Add a new variant of the GLA2GPA ioctl for use by the debug server.

Unlike the existing GLA2GPA ioctl, GLA2GPA_NOFAULT does not modify
the guest. In particular, it does not inject any faults or modify
PTEs in the guest when performing an address space translation.

This is used by bhyve's debug server to read and write memory for
the remote debugger.

Reviewed by: grehan
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D14075


# 4f866698 21-Feb-2018 John Baldwin <jhb@FreeBSD.org>

Add two new ioctls to bhyve for batch register fetch/store operations.

These are a convenience for bhyve's debug server to use a single
ioctl for 'g' and 'G' rather than a loop of individual get/set
ioctl requests.

Reviewed by: grehan
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D14074


# c49761dd 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/amd64: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# b4a5a4d0 20-Jan-2017 Andriy Gapon <avg@FreeBSD.org>

vmm_dev: work around a bogus error with gcc 6.3.0

The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]

Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.

MFC after: 1 week


# 5e4f29c0 06-Jul-2015 Neel Natu <neel@FreeBSD.org>

Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io

Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.

Requested by: grehan


# 9b1aa8d6 18-Jun-2015 Neel Natu <neel@FreeBSD.org>

Restructure memory allocation in bhyve to support "devmem".

devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks


# 9c4d5478 06-May-2015 Neel Natu <neel@FreeBSD.org>

Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428


# ef7c2a82 31-Mar-2015 Tycho Nightingale <tychon@FreeBSD.org>

Fix "MOVS" instruction memory to MMIO emulation. Currently updates to
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.

Add "MOVS" instruction support for the 'MMIO to MMIO' case.

Reviewed by: neel


# d087a399 17-Jan-2015 Neel Natu <neel@FreeBSD.org>

Simplify instruction restart logic in bhyve.

Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks


# c9c75df4 13-Jan-2015 Neel Natu <neel@FreeBSD.org>

'struct vm_exception' was intended to be used only as the collateral for the
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.

Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.

Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.

MFC after: 1 week


# 0dafa5cd 30-Dec-2014 Neel Natu <neel@FreeBSD.org>

Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.

The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks


# 091d4532 19-Jul-2014 Neel Natu <neel@FreeBSD.org>

Handle nested exceptions in bhyve.

A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.


# 5fcf252f 07-Jun-2014 Neel Natu <neel@FreeBSD.org>

Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained
by vmm.ko. This allows the virtual machine to be restarted without having
to destroy it first.

Reviewed by: grehan


# 95ebc360 31-May-2014 Neel Natu <neel@FreeBSD.org>

Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.


# da11f4aa 24-May-2014 Neel Natu <neel@FreeBSD.org>

Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).


# b3e9732a 15-May-2014 John Baldwin <jhb@FreeBSD.org>

Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.

Reviewed by: tycho


# f0fdcfe2 28-Apr-2014 Neel Natu <neel@FreeBSD.org>

Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with: grehan@


# 201b1ccc 10-Apr-2014 Peter Grehan <grehan@FreeBSD.org>

Rework r264179.

- remove redundant code
- remove erroneous setting of the error return
in vmmdev_ioctl()
- use style(9) initialization
- in vmx_inject_pir(), document the race condition
that the final conditional statement was detecting,

Tested with both gcc and clang builds.

Reviewed by: neel


# 0e30c5c0 05-Apr-2014 Warner Losh <imp@FreeBSD.org>

Make the vmm code compile with gcc too. Not entirely sure things are
correct for the pirbase test (since I'd have thought we'd need to do
something even when the offset is 0 and that test looks like a
misguided attempt to not use an uninitialized variable), but it is at
least the same as today.


# b15a09c0 26-Mar-2014 Neel Natu <neel@FreeBSD.org>

Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with: grehan


# 762fd208 11-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Replace the userspace atpic stub with a more functional vmm.ko model.

New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by: jhb, neel
Approved by: neel (co-mentor)


# dc506506 25-Feb-2014 Neel Natu <neel@FreeBSD.org>

Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with: grehan, jhb
Reviewed by: jhb


# e9ed7bc4 03-Feb-2014 Peter Grehan <grehan@FreeBSD.org>

Roll back botched partial MFC :(


# 3cbf3585 29-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks


# 330baf58 23-Dec-2013 John Baldwin <jhb@FreeBSD.org>

Extend the support for local interrupts on the local APIC:
- Add a generic routine to trigger an LVT interrupt that supports both
fixed and NMI delivery modes.
- Add an ioctl and bhyvectl command to trigger local interrupts inside a
guest. In particular, a global NMI similar to that raised by SERR# or
PERR# can be simulated by asserting LINT1 on all vCPUs.
- Extend the LVT table in the vCPU local APIC to support CMCI.
- Flesh out the local APIC error reporting a bit to cache errors and
report them via ESR when ESR is written to. Add support for asserting
the error LVT when an error occurs. Raise illegal vector errors when
attempting to signal an invalid vector for an interrupt or when sending
an IPI.
- Ignore writes to reserved bits in LVT entries.
- Export table entries the MADT and MP Table advertising the stock x86
config of LINT0 set to ExtInt and LINT1 wired to NMI.

Reviewed by: neel (earlier version)


# f80330a8 22-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE
state before the requested state transition. This guarantees that there is
exactly one ioctl() operating on a vcpu at any point in time and prevents
unintended state transitions.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html

Reviewed by: grehan
Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com)
MFC after: 3 days


# 4f8be175 16-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add an API to deliver message signalled interrupts to vcpus. This allows
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by: grehan@


# b5b28fc9 27-Nov-2013 Neel Natu <neel@FreeBSD.org>

Add support for level triggered interrupt pins on the vioapic. Prior to this
commit level triggered interrupts would work as long as the pin was not shared
among multiple interrupt sources.

The vlapic now keeps track of level triggered interrupts in the trigger mode
register and will forward the EOI for a level triggered interrupt to the
vioapic. The vioapic in turn uses the EOI to sample the level on the pin and
re-inject the vector if the pin is still asserted.

The vhpet is the first consumer of level triggered interrupts and advertises
that it can generate interrupts on pins 20 through 23 of the vioapic.

Discussed with: grehan@


# 08e3ff32 25-Nov-2013 Neel Natu <neel@FreeBSD.org>

Add HPET device emulation to bhyve.

bhyve supports a single timer block with 8 timers. The timers are all 32-bit
and capable of being operated in periodic mode. All timers support interrupt
delivery using MSI. Timers 0 and 1 also support legacy interrupt routing.

At the moment the timers are not connected to any ioapic pins but that will
be addressed in a subsequent commit.

This change is based on a patch from Tycho Nightingale (tycho.nightingale@pluribusnetworks.com).


# ac7304a7 22-Nov-2013 Neel Natu <neel@FreeBSD.org>

Add an ioctl to assert and deassert an ioapic pin atomically. This will be used
to inject edge triggered legacy interrupts into the guest.

Start using the new API in device models that use edge triggered interrupts:
viz. the 8254 timer and the LPC/uart device emulation.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 565bbb86 12-Nov-2013 Neel Natu <neel@FreeBSD.org>

Move the ioapic device model from userspace into vmm.ko. This is needed for
upcoming in-kernel device emulations like the HPET.

The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.

Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# e2f5d9a1 28-Oct-2013 Neel Natu <neel@FreeBSD.org>

Remove unnecessary includes of <machine/pmap.h>

Requested by: alc@


# d38cae4a 15-Oct-2013 Neel Natu <neel@FreeBSD.org>

Fix the witness warning that warned against calling uiomove() while holding
the 'vmmdev_mtx' in vmmdev_rw().

Rely on the 'si_threadcount' accounting to ensure that we never destroy the
VM device node while it has operations in progress (e.g. ioctl, mmap etc).

Reported by: grehan
Reviewed by: grehan


# 318224bb 05-Oct-2013 Neel Natu <neel@FreeBSD.org>

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


# 0acb0d84 09-May-2013 Neel Natu <neel@FreeBSD.org>

Support array-type of stats in bhyve.

An array-type stat in vmm.ko is defined as follows:
VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");

It is incremented as follows:
vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1);

And output of 'bhyvectl --get-stats' looks like:
ipis sent to vcpu[0] 3114
ipis sent to vcpu[1] 0

Reviewed by: grehan
Obtained from: NetApp


# 26d66b9d 12-Apr-2013 Neel Natu <neel@FreeBSD.org>

Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return
an error instead of panicking.

Obtained from: NetApp


# d5408b1d 11-Apr-2013 Neel Natu <neel@FreeBSD.org>

If vmm.ko could not be initialized correctly then prevent the creation of
virtual machines subsequently.

Submitted by: Chris Torek


# 485b3300 11-Feb-2013 Neel Natu <neel@FreeBSD.org>

Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'.

Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING)
that called 'sched_bind()' on behalf of the user thread.

The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn
runs afoul of the assertion '(td_pinned == 0)' in userret().

Using the cpuset affinity to implement pinning of the vcpu threads works with
both 4BSD and ULE schedulers and has the happy side-effect of getting rid
of a bunch of code in vmm.ko.

Discussed with: grehan


# 75dd3366 12-Oct-2012 Neel Natu <neel@FreeBSD.org>

Provide per-vcpu locks instead of relying on a single big lock.

This also gets rid of all the witness.watch warnings related to calling
malloc(M_WAITOK) while holding a mutex.

Reviewed by: grehan


# cdc5b9e7 11-Oct-2012 Neel Natu <neel@FreeBSD.org>

Fix warnings generated by 'debug.witness.watch' during VM creation and
destruction for calling malloc() with M_WAITOK while holding a mutex.

Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.


# 7ce04d0a 08-Oct-2012 Neel Natu <neel@FreeBSD.org>

Allocate memory pages for the guest from the host's free page queue.

It is no longer necessary to hard-partition the memory between the host
and guests at boot time.


# f7d51510 03-Oct-2012 Neel Natu <neel@FreeBSD.org>

Change vm_malloc() to map pages in the guest physical address space in 4KB
chunks. This breaks the assumption that the entire memory segment is
contiguously allocated in the host physical address space.

This also paves the way to satisfy the 4KB page allocations by requesting
free pages from the VM subsystem as opposed to hard-partitioning host memory
at boot time.


# 341f19c9 28-Sep-2012 Neel Natu <neel@FreeBSD.org>

Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether
or not the address range had already been allocated.

Replace this instead with an explicit API 'vm_gpa_available()' that returns
TRUE if a page is available for allocation in guest physical address space.


# e9027382 25-Sep-2012 Neel Natu <neel@FreeBSD.org>

Add ioctls to control the X2APIC capability exposed by the virtual machine to
the guest.

At the moment this simply sets the state in the 'vcpu' instance but there is
no code that acts upon these settings.


# 177fd533 25-Aug-2012 Peter Grehan <grehan@FreeBSD.org>

Add sysctls to display the total and free amount of hard-wired mem for VMs
# sysctl hw.vmm
hw.vmm.mem_free: 2145386496
hw.vmm.mem_total: 2145386496

Submitted by: Takeshi HASEGAWA hasegaw at gmail com


# cd942e0f 28-Apr-2012 Peter Grehan <grehan@FreeBSD.org>

MSI-x interrupt support for PCI pass-thru devices.

Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.

Submitted by: ryan dot berryhill at sandvine dot com
Reviewed by: grehan
Obtained from: Sandvine


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski