History log of /openbsd-current/usr.sbin/vmd/vm.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.100 29-Apr-2024 dv

vmm & vmd: drop "continue" flag to simplify running a vcpu.

There's no need to distinguish the "first" time running a vcpu from
the subsequent times because vmm(4) uses in-kernel state tracking
the last vm exit reason to optimize the logic for updating vcpu
registers from userland. While here, clean up the DPRINTF's to make
the Intel VMX logic similar to the AMD SVM.

ok mlarkin@


# 1.99 09-Apr-2024 dv

vmm/vmd: add exception injection and refactor inject api.

In order to continue work on mmio and other instruction emulation,
vmd(8) needs the ability to inject exceptions (like page faults)
from userland.

Refactor the way events are injected from userland, cleaning up how
hardware (external) interrupts are injected in the process.

ok mlarkin@


Revision tags: OPENBSD_7_5_BASE
# 1.98 20-Feb-2024 dv

Utilize separate threads for RX and TX in vmd(8)'s vionet.

This commit adds multithreading to allow both virtqueues to be
processed in parallel along with additional synchronization primitives
to protect device configuration state. Allowing RX and TX to operate
independently reduces overall network latency for guests and helps
alleviate the TX side dominating cpu time.

Tested with help from phessler@, kn@, and mlarkin@. ok mlarkin@.


# 1.97 05-Feb-2024 dv

Cleanup fcntl(3) usage and fd lifetimes in vmd(8).

Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.

ok mlarkin@, original diff ok claudio@ as well.


# 1.96 18-Jan-2024 claudio

Use imsg_get_fd() in vmd.

vmd uses a lot of fd passing and does it sometimes via extra abstraction
so this just tries to convert the code without any optimisations.

ok dv@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.99 09-Apr-2024 dv

vmm/vmd: add exception injection and refactor inject api.

In order to continue work on mmio and other instruction emulation,
vmd(8) needs the ability to inject exceptions (like page faults)
from userland.

Refactor the way events are injected from userland, cleaning up how
hardware (external) interrupts are injected in the process.

ok mlarkin@


Revision tags: OPENBSD_7_5_BASE
# 1.98 20-Feb-2024 dv

Utilize separate threads for RX and TX in vmd(8)'s vionet.

This commit adds multithreading to allow both virtqueues to be
processed in parallel along with additional synchronization primitives
to protect device configuration state. Allowing RX and TX to operate
independently reduces overall network latency for guests and helps
alleviate the TX side dominating cpu time.

Tested with help from phessler@, kn@, and mlarkin@. ok mlarkin@.


# 1.97 05-Feb-2024 dv

Cleanup fcntl(3) usage and fd lifetimes in vmd(8).

Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.

ok mlarkin@, original diff ok claudio@ as well.


# 1.96 18-Jan-2024 claudio

Use imsg_get_fd() in vmd.

vmd uses a lot of fd passing and does it sometimes via extra abstraction
so this just tries to convert the code without any optimisations.

ok dv@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.98 20-Feb-2024 dv

Utilize separate threads for RX and TX in vmd(8)'s vionet.

This commit adds multithreading to allow both virtqueues to be
processed in parallel along with additional synchronization primitives
to protect device configuration state. Allowing RX and TX to operate
independently reduces overall network latency for guests and helps
alleviate the TX side dominating cpu time.

Tested with help from phessler@, kn@, and mlarkin@. ok mlarkin@.


# 1.97 05-Feb-2024 dv

Cleanup fcntl(3) usage and fd lifetimes in vmd(8).

Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.

ok mlarkin@, original diff ok claudio@ as well.


# 1.96 18-Jan-2024 claudio

Use imsg_get_fd() in vmd.

vmd uses a lot of fd passing and does it sometimes via extra abstraction
so this just tries to convert the code without any optimisations.

ok dv@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.97 05-Feb-2024 dv

Cleanup fcntl(3) usage and fd lifetimes in vmd(8).

Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.

ok mlarkin@, original diff ok claudio@ as well.


# 1.96 18-Jan-2024 claudio

Use imsg_get_fd() in vmd.

vmd uses a lot of fd passing and does it sometimes via extra abstraction
so this just tries to convert the code without any optimisations.

ok dv@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.96 18-Jan-2024 claudio

Use imsg_get_fd() in vmd.

vmd uses a lot of fd passing and does it sometimes via extra abstraction
so this just tries to convert the code without any optimisations.

ok dv@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.95 10-Jan-2024 dv

vmm/vmd: add io instruction length to exit information.

Add the instruction length to the vm exit information to allower
vmd(8) to manipulate the instruction pointer after io emulation.
This is preparation for emulating string-based io instructions.

Removes the instruction pointer update from the kernel (vmm(4)) as
well as the instruction length checks, which were overly restrictive
anyways based on the way prefixes work in x86 instructions.

ok mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.94 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.93 26-Sep-2023 dv

vmd(8): fix vm pause deadlock.

When vcpu threads pause, they are holding the run mutex lock. If
the event thread is asked to assert an irq on the pic and interrupts
are pending, it will try to take the run mutex lock on the vcpu.
This deadlocks.

Release the lock in the vcpu thread before waiting on the pause
condition variable.

ok mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.92 23-Sep-2023 dv

vmd(8): log vmd's vm id, not vmm's in vcpu_run_loop.

Some guests cause a warning message during a shutdown. Log the vmd
vm id and not the kernel vmm id as it's next to useless to the end
user. This has annoyed me too much.


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.91 06-Sep-2023 dv

vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.

To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.

Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.

ok mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.90 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.89 13-May-2023 dv

vmm(4)/vmd(8): switch to anonymous shared mappings.

While splitting out emulated virtio network and block devices into
separate processes, I originally used named mappings via shm_mkstemp(3).
While this functionally achieved the desired result, it had two
unintended consequences:

1) tearing down a vm process and its child processes required
excessive locking as the guest memory was tied into the VFS layer.

2) it was observed by mlarkin@ that actions in other parts of the
VFS layer could cause some of the guest memory to flush to storage,
possibly filling /tmp.

This commit adds a new vmm(4) ioctl dedicated to allowing a process
request the kernel share a mapping of guest memory into its own vm
space. This requires an open fd to /dev/vmm (requiring root) and
both the "vmm" and "proc" pledge(2) promises. In addition, the caller
must know enough about the original memory ranges to reconstruct them
to make the vm's ranges.

Tested with help from Mischa Peters.

ok mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.88 28-Apr-2023 dv

vmd(8)/vmctl(8): allow vm owners to override boot kernel.

vmd allows non-root users to "own" a vm defined in vm.conf(5). While
the user can start/stop the vm, if they break their filesystem they
have no means of booting recovery media like a ramdisk kernel.

This change opens the provided boot kernel via vmctl and passes the
file descriptor through the control channel to vmd. The next boot
of the vm will use the provided file descriptor as boot kernel/bios.
Subsequent boots (e.g. a reboot) will return to using behavior
defined in vm.conf or the default bios image.

ok mlarkin@


# 1.87 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.86 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.85 23-Apr-2023 dv

vmd(8): teach vmm process how to exec.

Use execvp(2) to launch vm children with new address spaces.
Consequently, introduces use of unveil(2) into the vmm and vm
processes.

This imposes the requirement of launching vmd with absolute paths,
similar to sshd(8).

ok mlarkin@


# 1.84 23-Apr-2023 anton

unbreak tree by coping with recent s/XCR0/XFEATURE rename


Revision tags: OPENBSD_7_3_BASE
# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.83 06-Feb-2023 dv

vmd(8): scan pci bus to determine bootorder strings.

vmd's SeaBIOS bootorder strings had hardcoded pci device ids, so
if a user added a network interface the bootorder strings didn't
line up with reality. Using vmctl(8) to boot from a cdrom (-B cdrom)
would fail, for instance, if attaching both a nic and a disk as
well.

This change scans the pci devices and finds the first of each type
to construct viable bootorder strings.

ok jan@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.82 28-Jan-2023 dv

Move some header definitions from vmm(4) to vmd(8).

Part of an ongoing effort to move userland-specific information out
of a kernel header and directly into vmd(8). No functional change.

ok mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.81 08-Jan-2023 dv

vmd(8): add thread names to vm process.

ok guenther@.


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.80 04-Jan-2023 dv

Typos in vmd error message. No functional change.


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.79 28-Dec-2022 jmc

spelling fixes; from paul tagliamonte
any parts of his diff not taken are noted on tech


# 1.78 26-Dec-2022 dv

vmd(8): provide a detailed e820 memory map.

When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.

When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)

Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.

Tested by Mischa Peters and others. OK mlarkin@.


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.77 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.76 11-Nov-2022 dv

Revert removal of toggling interrupt line in vmd vcpu run loop.

phessler reports a performance regression. Needs more testing.


# 1.75 10-Nov-2022 dv

vmd(8): remove toggling interrupt line on vcpu in vcpu run loop

We toggle the interrupt "line" on the vcpu when we assert or deassert
irq on the pic in either the vcpu thread (emulating some devices)
or on the device event thread (mostly handling reading available
data). Having it in the vcpu run loop here just results in another
ioctl(2) call before the one for re-entering the guest cpu.

Removing it shows no noticeable behavioral change in existing guests.

ok mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.74 10-Nov-2022 dv

vmd(8): import mmio decode and emulation, disabled for now.

The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.

Only change to vmm(4) is reordering register #define's in vmmvar.h.

ok mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.73 01-Sep-2022 dv

vmm(4): send all port io emulation to userland

Simplify things by sending any io exits from IN/OUT instructions
to userland instead of trying to emulate anything in the kernel.
vmm was sending most pertinent exits to vmd anyways, so this
functionally changes little.

An added benefit is this solves an issue reported by tb@ where i386
OpenBSD guests would probe for a pc keyboard repeatedly and cause
excessive vm exits. (The emulation in vmm was not properly handling
these port reads.)

While here, make the assignment of the VEI_DIR_{IN,OUT} enum values
not assume the underlying integer the compiler may assign.

ok mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.72 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.71 29-Jun-2022 dv

vmd(8): fix off by one in vm memory range check

When inspecting if a gpa falls into a known memory range, vmd was
considering it valid 1 byte past the end resulting in selecting the
wrong starting range for the search.

ok mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.70 26-Jun-2022 dv

vmd: create a copy of bios at 4g boundary

Newer Linux kernels call into the bios to perform a reboot and our
version of SeaBIOS assumes there's a "copy" of the bios ending at
4g. When SeaBIOS reads from this area, since vmd doesn't perform
mmio yet, guests terminate with an unhandled fault.

Carve out some space ending at 4g and copy the bios there. Technically
we could load garbage there, but give SeaBIOS what it wants for
now.

ok mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.69 03-May-2022 dv

vmm/vmd/vmctl: standardize memory units to bytes

At different points in the vm lifecycle vmm(4), vmctl(8), and vmd(8)
refer to a vm's memory range sizes in either bytes or megabytes.
This is needlessly complex.

Switch to using bytes everywhere and adjust types and constants
accordingly. While this makes it possible to specify vm's with
memory in fractions of megabytes, the logic requiring whole
megabyte values remains.

Feedback from deraadt@, mlarkin@, and Matthew Martin.

ok mlarkin@


Revision tags: OPENBSD_7_1_BASE
# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.68 01-Mar-2022 dv

vmd(8): gracefully handle hitting data limits when starting a vm

With recent changes to login.conf(5) to restrict daemon datasize
to a finite value, users can now hit resource limits when attempting
to start a vm.

This change fixes the error path when hitting the limit. vmd(8)
will no longer abort and memory error messages are relayed to the
user.

While here, address potential under-reads/writes using atomicio
when relaying data between the child vm process and vmd's vmm
process.

Original diff from tedu@. OK mlarkin@.


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.67 30-Dec-2021 claudio

Add back support for -B net -b bsd.rd which emulates a PXE install and
results in an autoinstall. This can be used to quickly create new OpenBSD
installs.
OK dv@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.66 29-Nov-2021 deraadt

mostly avoid sys/param.h with a local nitems()
ok mlarkin


Revision tags: OPENBSD_7_0_BASE
# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.65 01-Sep-2021 dv

remove unused functions and cleanup vmd.h

Discussed with mlarkin@. These functions were implemented but never
used. While in vmd.h, fix the order to match current vmd(8) reality.


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.64 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.63 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.62 05-Apr-2021 dv

Support booting from compressed kernel images.

The bsd.rd ramdisk now ships gzip'd on amd64. Use libz in base to
transparently handle decompression of any compressed kernel images.

Patch from Josh Rickmar.

ok kn@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.61 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.60 19-Mar-2021 kn

Remove booting from kernels in raw/qcow2 images

Diff and (slightly tweaked) text below from
Dave Voutila < dave at sisu dot io >, thanks!

--
Since 6.7 switched to FFS2 as the default filesystem for new installs,
the ability for vmd(8) to load a kernel and boot.conf from a disk image
directly (without SeaBIOS) has been broken.

A diff from tb to add FFS2 support never mdae it into the tree.

On 5th Jan 2021, new ramdisks for amd64 have started shipping gzipped,
breaking the ability to load the bsd.rd directly as a kernel image for a vmd
guest without first uncompressing the image.

Using BIOS works, the FFS2 change happend ten months ago and few if any have
complained about the breakage. vmctl(8) is still vague about supporting it
per its man page and one still has to pass the disk image twice as a "-b"
and "-d" argument to boot an OpenBSD guest *without* BIOS.

Josh Rickmar reported the gzip issue on bugs@ and provided patches to add
support for compressed ramdisks and kernel images. The easiest way to do so
is to drop support for FFS images since they require a call to fmemopen(3)
while all the other logic uses fopen(3)/fdopen(3) calls and a file
descriptor. It is much easier to get thsoe patches merged if they don't
have to account for extracting files from disk images.
--

No objections anyone
"Removing it makes sense" reyk (who wrote the FFS module)
OK mlarkin


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.59 13-Feb-2021 mlarkin

Fix some wrong comments and KNF/long line wraps


Revision tags: OPENBSD_6_8_BASE
# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.58 28-Jun-2020 pd

vmd(8): Eliminate libevent state corruption

libevent functions for com, pic and rtc are now only called on event_thread.
vcpu exit handlers send messages on a dev pipe and callbacks on these events do
the event management (event_add, evtimer_add, etc). Previously, libevent state
was mutated by two threads, event_thread, that runs all the callbacks and the
vcpu thread when running exit handlers. This could have lead to libevent state
corruption.

Patch from Dave Voutila <dave@sisu.io>

ok claudio@
tested by abieber@ and brynet@


Revision tags: OPENBSD_6_7_BASE
# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.57 30-Apr-2020 pd

vmd(8): correctly terminate vm processes after sending vm

Instead of a round about way of sending a message to vmm that 'send is
successful' and terminating by vm_remove from vmm, we can send the imsg and
exit in the vm process. The sigchld handler in vmm will vm_remove it from its
structures. This is how a normal vm is terminated as well.

Previously, vm_remove was called in vmm_dispatch_vm (ie. the event handler to
receive messages from vm process) when hanlding the IMSG_VMDOP_SEND_VM_RESPONSE
(ie. the vm process has written the vm state to the fd passed on by vmctl
send). This is not how vm_remove was intented to be used as it does a
free(vm). The vm struct holds the buffers for imsg and so after handling this
IMSG_VMDOP_SEND_VM_RESPONSE message, vmm_dispatch_vm loops again to do
imsg_get(ibuf, &imsg) to read the next message (and we had just freed this
*ibuf when we freed the vm struct) causing it to segfault.

reported by kn@
ok kn@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.56 21-Apr-2020 pd

vmd: improve concurrency control in pause

Previous implementation hit a deadlock sometimes as the pthread_cond_broadcast
for the pause mutex could happen before pthread_cond_wait. This implementation
uses a barrier which is hit when all vpcus are paused.

ok mpi@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.55 08-Apr-2020 pd

vmm(4): add IOCTL handler to sets the access protections of the ept

This exposes VMM_IOC_MPROTECT_EPT which can be used by vmd to lock in physical
pages. Currently, vmd just terminates the vm in case it gets a protection fault
in the future.

This feature is used by solo5 which uses vmm(4) as a backend hypervisor.

ok mpi@

Patch from Adam Steen <adam@adamsteen.com.au>


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.54 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.53 30-Nov-2019 mlarkin

Revert previous - the stability was not as improved as we had thought and
we ended up accidentally breaking vmctl. This will need more thought.

ok ori@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.52 29-Nov-2019 mlarkin

Fix at least one cause of VMs spinning at 100% host CPU

After debugging with ori@, it looks like an event ends up on the wrong
libevent queue, and we end continually de-queueing and re-queueing the
event continually. While it's unclear exactly why this happened, a clue
on libevent's github issues page for the same problem pointed us to using
a different event base for the device events. This seems to have unstuck
ori@'s problematic VM, and I have also seen no more hangs after this.

We have not completely separated the queues; ori@ will work on setting
new libevent bases for those later. But those events are pretty
frequency.

with help from and ok ori@


Revision tags: OPENBSD_6_6_BASE
# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.51 17-Jul-2019 pd

vmm/vmd: Fix migration with pvclock

Implement VMM_IOC_READVMPARAMS and VMM_IOC_WRITEVMPARAMS ioctls to read and
write pvclock state.

reads ok mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.50 28-Jun-2019 deraadt

When system calls indicate an error they return -1, not some arbitrary
value < 0. errno is only updated in this case. Change all (most?)
callers of syscalls to follow this better, and let's see if this strictness
helps us in the future.


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.49 28-May-2019 pd

vmd: unset CR0_CD and CR0_NW in default flat64 register values

These never got unset on AMD/SVM guests when booted via vmctl start
-b causing them to run very slow

ok mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.48 12-May-2019 pd

vmm: add a x86 page table walker

Add a first cut of x86 page table walker to vmd(8) and vmm(4). This function is
not used right now but is a building block for future features like HPET, OUTSB
and INSB emulation, nested virtualisation support, etc.

With help from Mike Larkin

ok mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.47 11-May-2019 jasper

vm_dump_header allocated space for a signature but it was never set;
set it to VMM_HV_SIGNATURE and check for it upon restoring a vm image

ok mlarkin@ pd@


# 1.46 11-May-2019 jasper

track the state of the vm (running, paused, etc) using a single bitfield instead of
a handful of separate variables. this will makes it easier for vmd to report
and check on the individual vm states

no functional change intended

ok ccardenas@ mlarkin@


Revision tags: OPENBSD_6_5_BASE
# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.45 01-Mar-2019 mlarkin

vmd(8): remove some i386 remnants that missed the original cleanup

ok pd, kn, deraadt


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.44 20-Feb-2019 mlarkin

vmd(8): initialize guest %drX registers to power-on defaults on launch

Initializes the %drX registers to power on defaults, and bump the VM
send/recieve header to reflect same

discussed with deraadt@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.43 10-Dec-2018 claudio

Implement the fw_cfg interface basics and use it to set the bootorder
if a bootdevice was forced. This implements both the pure IO port interface
and also the new DMA interface, a few direct commands are implemented which
are needed but in general the "file" interface should be used. There is no
write support for the guest. Tested against the latest vmm-firmware port.
This requires also a -current kernel to pass the IO ports to vmd(8).
OK mlarkin@ ccardenas@


# 1.42 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.41 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.40 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.39 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.38 17-Jul-2018 mlarkin

vmd(8): fix vmctl -b option for i386 kernels.

ok pd@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.37 12-Jul-2018 mlarkin

vmm(8)/vmm(4): send a copy of the guest register state to vmd on exit,
avoiding multiple readregs ioctls back to vmm in case register content
is needed subsequently.

ok phessler


# 1.36 10-Jul-2018 mlarkin

vmd(8): route ELCR handler to the right function


# 1.35 09-Jul-2018 mlarkin

vmd(8): better debug message in a failure case


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.34 19-Jun-2018 reyk

knf


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.33 27-Apr-2018 mlarkin

vmd(8): implement vmd side of ELCR registers

ok guenther


# 1.32 26-Apr-2018 mlarkin

vmd(8): handle PIT channel 2 status readback via port 0x61

Allow PIT channel 2 status (fired/counting) readback via port 0x61
bit 5.

ok guenther@


Revision tags: OPENBSD_6_3_BASE
# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@


# 1.31 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


# 1.30 29-Nov-2017 mlarkin

make vmm(4) less responsible for initial register state, preferring to let
usermode daemons handle that.

ok pd@


# 1.29 28-Nov-2017 mlarkin

fix some spelling errors in a few comments


Revision tags: OPENBSD_6_2_BASE
# 1.28 19-Sep-2017 mlarkin

Clarify a wrong conditional, found by jsg.

ok jsg


# 1.27 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.26 17-Sep-2017 pd

vmd: re add rtc.per and rtc.sec evtimers on receive

This was missed in receive. mc146818_start is already defined. This fixes rtc
time resync on receive.

ok mlarkin@


# 1.25 11-Sep-2017 dlg

add functions to provide direct access to guest memory as vmd addresses

iovec_mem() populates an iovec array based on guest physical
addresses. this allows the use of things like readv and writev for
moving data between the guest and a disk image file without having
to bounce the memory.

vaddr_mem() provides a vmd usable pointer based on a guests physical
address. this makes it possible to directly reference things like
virtio rings without having to bounce that memory either. however,
it assumes that a contiguous range of guest physical memory will
sit in a single vm memory range. mlarkin@ says this is right.

ok mlarkin@


# 1.24 20-Aug-2017 pd

vmd: Allow only upward migration

This restricts receiving vms from hosts with more cpu features.

Tested on
broadwell -> skylake (works)
skylake -> broadwell (don't work)

ok mlarkin@


# 1.23 14-Aug-2017 mlarkin

vmd: set MSR_MISC_ENABLE=0 on vm creation, this will be re-set in vmm
based on proper values from the host in use.


# 1.22 15-Jul-2017 pd

Add vmctl send and vmctl receive

ok reyk@ and mlarkin@


# 1.21 09-Jul-2017 pd

vmd/vmctl: Add ability to pause / unpause vms

With help from Ashwin Agrawal

ok reyk@ mlarkin@


# 1.20 07-Jun-2017 mlarkin

vmd: Implement simulated baudrate support in the ns8250 module. The
previous version was allowing an output rate that is "too fast", and linux
guests would give up after 512 characters TXed ("too much work for irq4").

This diff calculates the approximate rate we can sustain at the current
programmed baud rate and limits the output to that rate by inserting a
HZ delay after a specified number of characters have been transmitted.
This fixes the linux guest console issue.

Note that the console now outputs at more or less the selected baud rate,
instead of nearly instantaneously as before - if you selected 9600 in
your guest VMs before, you might want to change that to 115200 now for a
better console experience.

krw@ "seems like a good idea to me"


# 1.19 30-May-2017 tedu

split vioblk read/write functions into start and finish as prep for
async io operations. ok mlarkin


# 1.18 28-May-2017 mlarkin

SVM: add some exit types

Also, fix a comment that wasn't applicable anymore, and change a format
from decimal to hex


# 1.17 05-May-2017 reyk

VMs cannot use proc_compose() to PROC_VMM, they have to use
imsg_compose() on the "vmm_pipe" directly. This fixes the
communication channel from VMs back to vmm.


# 1.16 05-May-2017 mlarkin

Allow vmd(8) to set guest %xcr0

Usermode part of previous vmm(4) diff.

Posted to tech by Pratik Vyas


# 1.15 02-May-2017 mlarkin

fix an error in i386 vmd build


# 1.14 02-May-2017 mlarkin

Matching vmd(8) part of previous diff (first part of vmctl send/receive).

ok kettenis


# 1.13 25-Apr-2017 reyk

spacing


# 1.12 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.11 27-Mar-2017 deraadt

die whitespace die die die


# 1.10 25-Mar-2017 mlarkin

Last bits needed to get seabios + alpine linux working. This is enough
to get started and let more people help finding and fixing bugs.

ok kettenis, deraadt


# 1.9 25-Mar-2017 reyk

Boot using BIOS from /etc/firmware/vmm-bios by default.

Instead of using the internal "vmboot", VMs will now be booted using
the external BIOS firmware in /etc/firmware/vmm-bios (which is subject
to a LGPLv3 license). Direct booting of OpenBSD kernels or
non-default BIOS images is still supported for now using the -b/boot
option that is replacing the -k/kernel option.

As requested by Theo, vmd(8) fails if neither the default BIOS is
found nor a kernel has been specified in the VM configuration. The
"vmm" BIOS has to be installed using fw_update(1), which will be done
automatically in most cases where the OpenBSD can fetch it after
install/upgrade.

OK mlarkin@


# 1.8 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.7 25-Mar-2017 mlarkin

Introduce a new function to obtain properly sized input data, and convert
i8253/i8259/mc146818 emulation to use this.


# 1.6 24-Mar-2017 mlarkin

Allow vmd to proceed after an interrupt occurred after retiring a cpuid
instruction. Matches previous commit to kernel vmm.c


# 1.5 23-Mar-2017 mlarkin

Implement memory size and SMP CPU count NVRAM registers in the emulated
mc146818. This is needed for seabios to boot properly (and construct
a sensible e820 map to send to the guest OS).


# 1.4 21-Mar-2017 mlarkin

Fix two errors in NS8250 (UART) emulation. The first error zeroed out the
high bits of %eax on reading register data from the emulated UART ports.
The second error didn't properly assert the TXRDY bit during init -
this bit was only set after the first character was sent. Both these
bugs caused seabios to not be able to output any data. Found during the
recent effort to get Linux guests booting.


# 1.3 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.2 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.1 01-Mar-2017 reyk

Split vmm.c into two files: vm.c for the VM child, vmm.c for the parent

As discussed with mlarkin@, it makes it easier to maintain the file.

OK mlarkin@