History log of /openbsd-current/usr.sbin/vmd/virtio.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.51 20-Feb-2024 dv

Utilize separate threads for RX and TX in vmd(8)'s vionet.

This commit adds multithreading to allow both virtqueues to be
processed in parallel along with additional synchronization primitives
to protect device configuration state. Allowing RX and TX to operate
independently reduces overall network latency for guests and helps
alleviate the TX side dominating cpu time.

Tested with help from phessler@, kn@, and mlarkin@. ok mlarkin@.


# 1.50 30-Jan-2024 dv

Rewrite vmd(8)'s vionet to be zero-copy.

Similar to the rewrite of the virtio block device to use zero-copy
semantics, this rewrites how the virtio network device works with
the virtqueue ring buffers to minimize data copying. For guests
that don't use the built-in DNS and mac filtering capabilities,
data can now be transfered to/from the virtqueue and the tap(4)
directly without temporary buffers.

A lot of the virtio semantics are cleaned up as well, including
proper error states.

Tested with help by mbuhl@, friehm@, mlarkin@, and others.

"go for it," mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.49 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.48 14-Sep-2023 dv

vmd(8)/vioblk: use zero-copy approach & vectored io.

The original version of the virtio block device dynamically allocated
buffers to hold intermediate data when reading or writing to the
underlying disk fd(s). Since vioblk drivers may chain multiple
segments together, this leads to overly complex logic and on
read(2)/write(2) call per data segment.

Additionally, the virtio block logic in vmd didn't handle segments
that weren't block aligned (e.g. 512 bytes). If a guest provided
unaligned segments, garbage will be read or written.

Since virtio descriptors mimic iovec structures, this changes vmd's
device emulation to use that model. (This is how other hypervisors
emulate virtio devices.) This allows for zero-copy semantics using
iovec's, reducing memcpy and multiple read/write syscalls per io
transaction.

Testing by phessler@ and mlarkin@. OK mlarkin@.


# 1.47 06-Sep-2023 dv

vmd(8): clean up struct ioinfo.

In prep for fixing some vioblk device issues, simplify the ioinfo
struct by dropping members that aren't needed.

ok mlarkin@


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.50 30-Jan-2024 dv

Rewrite vmd(8)'s vionet to be zero-copy.

Similar to the rewrite of the virtio block device to use zero-copy
semantics, this rewrites how the virtio network device works with
the virtqueue ring buffers to minimize data copying. For guests
that don't use the built-in DNS and mac filtering capabilities,
data can now be transfered to/from the virtqueue and the tap(4)
directly without temporary buffers.

A lot of the virtio semantics are cleaned up as well, including
proper error states.

Tested with help by mbuhl@, friehm@, mlarkin@, and others.

"go for it," mlarkin@


Revision tags: OPENBSD_7_4_BASE
# 1.49 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.48 14-Sep-2023 dv

vmd(8)/vioblk: use zero-copy approach & vectored io.

The original version of the virtio block device dynamically allocated
buffers to hold intermediate data when reading or writing to the
underlying disk fd(s). Since vioblk drivers may chain multiple
segments together, this leads to overly complex logic and on
read(2)/write(2) call per data segment.

Additionally, the virtio block logic in vmd didn't handle segments
that weren't block aligned (e.g. 512 bytes). If a guest provided
unaligned segments, garbage will be read or written.

Since virtio descriptors mimic iovec structures, this changes vmd's
device emulation to use that model. (This is how other hypervisors
emulate virtio devices.) This allows for zero-copy semantics using
iovec's, reducing memcpy and multiple read/write syscalls per io
transaction.

Testing by phessler@ and mlarkin@. OK mlarkin@.


# 1.47 06-Sep-2023 dv

vmd(8): clean up struct ioinfo.

In prep for fixing some vioblk device issues, simplify the ioinfo
struct by dropping members that aren't needed.

ok mlarkin@


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.49 26-Sep-2023 dv

vmd(8): disambiguate log messages per vm and device.

The logging output from vmd(8) often specifies the function performing
the logging, but leaves which vm or vm device to guesswork and
reading tea leaves.

Change the logging formatting to prefix with information about the
specific vm and potentially the device subprocess. Most of this
logging is behind the "verbose" mode, but for warnings this will
clarify which vm or device logged the warning.

The format of vm/<name>/<device><index> is chosen to be concise and
less ugly than other approaches. This adjusts the process naming
for devices to match, dropping the use of brackets.

In the process of this change, updating log settings dynamically
via vmctl(8) is fixed by properly broadcasting that information to
the device subprocesses. The "vmm" process also now updates its own
state properly, so settings survive vm reboots.

ok mlarkin@


# 1.48 14-Sep-2023 dv

vmd(8)/vioblk: use zero-copy approach & vectored io.

The original version of the virtio block device dynamically allocated
buffers to hold intermediate data when reading or writing to the
underlying disk fd(s). Since vioblk drivers may chain multiple
segments together, this leads to overly complex logic and on
read(2)/write(2) call per data segment.

Additionally, the virtio block logic in vmd didn't handle segments
that weren't block aligned (e.g. 512 bytes). If a guest provided
unaligned segments, garbage will be read or written.

Since virtio descriptors mimic iovec structures, this changes vmd's
device emulation to use that model. (This is how other hypervisors
emulate virtio devices.) This allows for zero-copy semantics using
iovec's, reducing memcpy and multiple read/write syscalls per io
transaction.

Testing by phessler@ and mlarkin@. OK mlarkin@.


# 1.47 06-Sep-2023 dv

vmd(8): clean up struct ioinfo.

In prep for fixing some vioblk device issues, simplify the ioinfo
struct by dropping members that aren't needed.

ok mlarkin@


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.48 14-Sep-2023 dv

vmd(8)/vioblk: use zero-copy approach & vectored io.

The original version of the virtio block device dynamically allocated
buffers to hold intermediate data when reading or writing to the
underlying disk fd(s). Since vioblk drivers may chain multiple
segments together, this leads to overly complex logic and on
read(2)/write(2) call per data segment.

Additionally, the virtio block logic in vmd didn't handle segments
that weren't block aligned (e.g. 512 bytes). If a guest provided
unaligned segments, garbage will be read or written.

Since virtio descriptors mimic iovec structures, this changes vmd's
device emulation to use that model. (This is how other hypervisors
emulate virtio devices.) This allows for zero-copy semantics using
iovec's, reducing memcpy and multiple read/write syscalls per io
transaction.

Testing by phessler@ and mlarkin@. OK mlarkin@.


# 1.47 06-Sep-2023 dv

vmd(8): clean up struct ioinfo.

In prep for fixing some vioblk device issues, simplify the ioinfo
struct by dropping members that aren't needed.

ok mlarkin@


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.47 06-Sep-2023 dv

vmd(8): clean up struct ioinfo.

In prep for fixing some vioblk device issues, simplify the ioinfo
struct by dropping members that aren't needed.

ok mlarkin@


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.46 13-Jul-2023 dv

vmd(8): pull validation into local prefix parser.

Validation for local prefixes, both inet and inet6, was scattered
around. To make it even more confusing, vmd was using generic address
parsing logic from prior network daemons. vmd doesn't need to parse
addresses other than when parsing the local prefix settings in
vm.conf and no runtime parsing is needed.

This change merges parsing and validation based on vmd's specific
needs for local prefixes (e.g. reserving enough bits for vm id and
network interface id encoding in an ipv4 address). In addition, it
simplifies the struct from a generic address struct to one focused
on just storing the v4 and v6 prefixes and masks. This cleans up an
unused TAILQ struct member that isn't used by vmd and was leftover
copy-pasta from those prior daemons.

The address parsing that vmd uses is also updated to using the
latest logic in bgpd(8).

ok mlarkin@


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.45 27-Apr-2023 dv

vmd(8): introduce multi-process model for virtio devices.

Isolate virtio network and block device emulation in dedicated
processes, forked and exec'd from the vm process. This allows for
tightening pledge promises to just "stdio".

Communication between the vcpu's and these devices now occurs via
imsg channels, which adds the benefit of not always blocking the
vcpu thread while emulating the device.

With this commit, it's possible that vmd is the first open source
hypervisor that *defaults* to a multi-process device emulation
model without requiring any additional configuration from the
operator.

Testing help from phessler@ and Mischa Peters.

ok mlarkin@


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.44 25-Apr-2023 dv

vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.

The object sent to vmm(4) contained file paths and details the
kernel does not need for cpu virtualization as device emulation is
in userland. Effectively, "pull up" the struct members from the
vm_create_params struct to the parent vmop_create_params struct.

This allows us to clean up some of vmd(8) and simplify things for
switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd,
etc.) to allow users to boot recovery ramdisk kernels.

ok mlarkin@


Revision tags: OPENBSD_7_3_BASE
# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.43 23-Dec-2022 dv

vmd(8): implement zero-copy operations on virtqueues.

The original virtio device implementation relied on allocating a
buffer on heap, copying the virtqueue from the guest, mutating the
copy, and then overwriting the virtqueue in the guest.

While the approach worked, it was both complex and added extra
overhead. On older hardware, switching to the zero-copy approach
can show a noticeable performance improvement for vionet devices.
An added benefit is this diff also reduces the amount of code in
vmd, which is always a welcome change.

In addition, change to talking about the queue pfn and not "address"
as the virtio-pci spec has drivers provide a 32-bit value representing
the physical page number of the location in guest memory, not the
linear address.

Original idea from dlg@ while working on re-adding async task queues.

ok dlg@, tested by many


Revision tags: OPENBSD_7_2_BASE
# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.42 04-May-2022 dv

vmctl(8)/vmd(8): convert disk sizes from MB to bytes

Continue converting other parts to storing data in bytes instead
of MB. In this case, the logic for disk sizes was being scaled.

This fixes issues reported by Martin Vahlensieck where vmctl could
no longer create disks larger than 7 MiB after previous commits to
change storing memory sizes as bytes.

While this keeps the vm memory limit check in vmctl's size parser,
it skips the limit check for disks. The error messages adjust
accordingly and this removes the double error message logging.

Update comments and function types accordingly.

ok marlkin@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE
# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.41 16-Jul-2021 dv

vmd(8): simplify vcpu logic, removing uart & vionet reads

Remove legacy state handling on the ns8250 and virtio network devices
originally put in place before using libevent for async device
events. The vcpu thread doesn't need to process device data as it is
handled by the libevent thread.

This has the benefit of simplifying some of the message passing
between threads introduced to the ns8250 uart since both the vcpu
and libevent threads were processing read events.

No functional change intended. Tested by many, including abieber@,
weerd@, Mischa Peters, and Matthias Schmidt. (Thanks.)

OK mlarkin@


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.40 21-Jun-2021 dv

vmd(8): support variable length vionet rx descriptor chains

The original implementation of the virtio network device assumed a
driver would only provide a 2-descriptor chain for receiving packets.
The virtio spec allows for variable length chains and drivers, in
practice, construct them when they use a sufficiently large MTU.

This change lets the device use variable length chains provided by
the driver, thus allowing for drivers to set an MTU up to the
underlying host-side tap(4)'s limit of TUNMRU (16384).

Size limitations are now enforced on both tx and rx-side dropping
anything violating the underlying tap(4) min and max limits.

More work is needed to increase the read(2) buffer in use by vmd
to prevent packet truncation.

OK mlarkin@


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.39 16-Jun-2021 dv

cleanup vmd(8) includes and header files

Lots of organic growth other the years lead to unnecessary includes
(proc.h everywhere) and odd dependencies between header files. This
cleans things up a bit to help with upcoming cleanup around dhcp
code.

No functional change.

"go for it" mlarkin@


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

branches: 1.37.2;
Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

branches: 1.35.6;
vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.38 21-Apr-2021 dv

Fix packet size checks and remove bad casts.

Because dhcpsz was an uninitialized ssize_t, it was possible that a
garbage "packet" would be queued on the receiving end of the virtio
network device.

Change the type to size_t and add proper checks based on it being
greater than zero. Remove the cast of ssize_t to uint64_t that also
caused garbage sizes when dhcpsz was unintialized and set at runtime
to something < 0.


Revision tags: OPENBSD_6_9_BASE
# 1.37 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.37 29-Mar-2021 dv

Propagate host-side tap(4) lladdr to guest vm process to allow unicast dhcp
and bootp renewals with vmd(8)'s built-in dhcp server. Previous behavior
ignored did not intercept these packets and instead transmitted them.

This should make vmd(8)'s dhcp behave more as a true dhcp server should and
allows it to work properly with the new dhcpleased(8) attempting a renewal.

OK mlarkin@


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.36 07-Jan-2021 tracey

bump VM shutdown event timeout ok mlarkin@ stsp@ florian@

VMs with addition package daemons were not given enough time to shutdown
gracefully.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.35 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.35 11-Dec-2019 pd

vmd: proper concurrency control when pausing a vm

Removes an XXX which slept for 1s waiting for the vcpu thread to reach HLT and
pause. We now define a paused and unpaused condition so that a call to
pause_vm() / vmctl pause blocks till the vm really reaches a paused state.

Also, detach events for devices from event loop when pausing and add them back
when unpausing. This is because some callbacks call pthread_mutex_lock and if
the vm is paused, it would block also causing the libevent thread to block.
This would mean that we would not be able to process any IMSGs received from vmm
(parent process) including a message to unpause.


ok mlarkin@


Revision tags: OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.34 06-Dec-2018 claudio

Make it possible to define the bootdevice in vmd. This information is used
currently only when booting a OpenBSD kernel. If VMBOOTDEV_NET is used the
internal dhcp server will pass "auto_install" as boot file to the client and
the boot loader passes the MAC of the first interface to the kernel to indicate
PXE booting. Adding boot order support to SeaBIOS is not yet implemented.
Ok ccardenas@


# 1.33 26-Nov-2018 reyk

Move the {qcow2,raw} create functions from vmctl into vmd/vio{qcow2,raw}.c

This way they are in the appropriate place and code can be shared with vmd.

Ok ori@ mlarkin@ ccardenas@


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.32 19-Oct-2018 reyk

Add support to create and convert disk images from existing images

The -i option to vmctl create (eg. vmctl create output.qcow2 -i input.img)
lets you create a new image from an input file and convert it if it is a
different format. This allows to convert qcow2 images from raw images,
raw from qcow2, or even qcow2 from qcow2 and raw from raw to re-optimize
the disk.

This re-uses Ori's vioqcow2.c from vmd by reaching into it and
compiling it in. The API has been adjust to be used from both vmctl
and vmd accordingly.

OK mlarkin@


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


Revision tags: OPENBSD_6_4_BASE
# 1.31 08-Oct-2018 reyk

Add support for qcow2 base images (external snapshots).

This works is from Ori Bernstein, committing on his behalf:

Add support to vmd for external snapshots. That is, snapshots that are
derived from a base image. Data lookups start in the derived image,
and if the derived image does not contain some data, the search
proceeds ot the base image. Multiple derived images may exist off of
a single base image.

A limitation of this format is that modifying the base image will
corrupt the derived image.

This change also adds support for creating disk derived disk images to
vmctl. To use it:

vmctl create derived.qcow2 -s 16G -b base.qcow2

From Ori Bernstein
OK mlarkin@ reyk@


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.30 28-Sep-2018 reyk

Support vmd-internal's vmboot with qcow2 disk images.

OK mlarkin@


# 1.29 19-Sep-2018 ccardenas

Various clean up items for disks.

- qcow2: general cleanup
- vioraw: check malloc
- virtio: add function to sync disks
- vm: call virtio_shutdown to sync disks when vm is finished executing

Thanks to Ori Bernstein.

Ok miko@


# 1.28 09-Sep-2018 ccardenas

Add initial qcow2 image support.

Users are able to declare disk images as 'raw' or 'qcow2' using either
vmctl and vm.conf. The default disk image format is 'raw' if not specified.

Examples of using disk format:

vmctl start bsd -Lc -r cd64.iso -d qcow2:current.qc2
or
vmctl start bsd -Lc -r cd64.iso -d raw:current.raw
is equivalent to
vmctl start bsd -Lc -r cd64.iso -d current.raw

in vm.conf
vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.qc2" format "qcow2"
interface { switch "external" }
}

or

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw" format "raw"
interface { switch "external" }
}

is equivlanet to

vm "current" {
disable
memory 2G
disk "/home/user/vmm/current.raw"
interface { switch "external" }
}

Tested by many.

Big Thanks to Ori Bernstein.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.27 25-Aug-2018 ccardenas

Rework disks to have pluggable backends.

This is prep work for adding qcow2 image support.

From Ori Bernstein. Many thanks!

Tested by many.

OK ccardenas@


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.26 09-Jul-2018 mlarkin

vmd(8): stash device IRQ in the device struct

ok kettenis


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.25 26-Apr-2018 mlarkin

vmd(8): bump virtio network max queue size to 256 (to match qemu)


# 1.24 26-Apr-2018 mlarkin

vmd(8): use #defines for queue indices and cleanup some code

ok phessler


Revision tags: OPENBSD_6_3_BASE
# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.


# 1.23 15-Jan-2018 ccardenas

VMD: vioscsi refactor

Each opcode is now handled in the respective function (vioscsi_handle_xxx)
which allows more functionality to be added easier.

No functional changes confirmed by guest testing.

ok mlarkin@


# 1.22 03-Jan-2018 ccardenas

Add initial CD-ROM support to VMD via vioscsi.

* Adds 'cdrom' keyword to vm.conf(5) and '-r' to vmctl(8)
* Support various sized ISOs (Limitation of 4G ISOs on Linux guests)
* Known working guests: OpenBSD (primary), Alpine Linux (primary),
CentOS 6 (secondary), Ubuntu 17.10 (secondary).
NOTE: Secondary indicates some issue(s) preventing full/reliable
functionality outside the scope of the vioscsi work.
* If the attached disks are non-bootable (i.e. empty), SeaBIOS (vmd's
default BIOS) will boot from CD-ROM.

ok mlarkin@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.21 17-Sep-2017 pd

vmd: send/recv pci config space instead of recreating pci devices on receive

ok mlarkin@


# 1.20 12-Aug-2017 mlarkin

vmd: bump virtio queue size back to 128. The problem that resulted in
lowering the queue size to 64 was caused by something unrelated.


# 1.19 20-Jun-2017 mlarkin

Revert a previous commit that increased the virtio queue size since it
appears to be causing some instability.


# 1.18 30-May-2017 mlarkin

increase vmd(8) virtio queue size from 64 to 128. Also fix an old
copypaste bug that didn't hurt us as long as all the queue sizes were
the same, which was the case up to now.

suggested by sf@, ok krw@


# 1.17 08-May-2017 reyk

Adds functions to read and write state of devices in vmd.

This is required for implementing vmctl send and vmctl receive. vmctl
send / receive are two new options that will support snapshotting VMs
and migrating VMs from one host to another. The atomicio files are
copied from usr.bin/ssh.

Patch from Pratik Vyas; this project was undertaken at San Jose State
University along with his three teammates, Ashwin, Harshada and Siri
with mlarkin@ as the advisor.

OK mlarkin@


# 1.16 02-May-2017 mlarkin

Resynchronize the guest RTC via vmmci(4) on host resume from zzz/ZZZ
(vmd part)

This feature is for OpenBSD guests only.

ok reyk, kettenis


# 1.15 19-Apr-2017 reyk

Add support for dynamic "NAT" interfaces (-L/local interface).

When a local interface is configured, vmd configures a /31 address on
the tap(4) interface of the host and provides another IP in the same
subnet via DHCP (BOOTP) to the VM. vmd runs an internal BOOTP server
that replies with IP, gateway, and DNS addresses to the VM. The
built-in server only ever responds to the VM on the inside and cannot
leak its DHCP responses to the outside.

Thanks to Uwe Werler, Josh Grosse, and some others for testing!

OK deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.14 27-Mar-2017 deraadt

die whitespace die die die


# 1.13 26-Mar-2017 mlarkin

Implement a missing command in vioblk and allow > MAXPHYS transfers.

This diff (with the others previously committed) allows ubuntu 14.04
amd64 guests to work.


# 1.12 25-Mar-2017 mlarkin

Implement some missing functionality and clean up some code in vmd
pci emulation.

ok kettenis


# 1.11 15-Mar-2017 reyk

Improve vmmci(4) shutdown and reboot.

This change handles various cases to power off the VM, even if it is
unresponsive, stuck in ddb, or when the shutdown was initiated from
the VM guest side. Usage of timeout and VM ACKs make sure that the VM
is really turned off at some point.

OK mlarkin@


# 1.10 02-Mar-2017 reyk

Add "locked lladdr" option to prevent VMs from spoofing MAC addresses.

This is especially useful when multiple VMs share a switch, the
implementation is independent from the underlying switch or bridge.

no objections mlarkin@


# 1.9 21-Jan-2017 mlarkin

updated include paths for recently moved virtio stuff


# 1.8 19-Jan-2017 reyk

Export the host time to the guest, add it as a timedelta sensor in vmmci(4)

OK kettenis@ mlarkin@


# 1.7 13-Jan-2017 reyk

Add host side of vmmci(4) to vmd(8).

It currently uses the device to request graceful shutdown of a VM on
"vmctl stop myvm" but will be extended for reboot and a other edge cases.

OK mlarkin@


# 1.6 12-Oct-2016 mlarkin

Allow 4 vio(4) interfaces in each VM. Also fix a bad interrupt assignment that
caused IRQ9 to be shared between the second disk device and the vio(4)s,
which caused poor network performance.

ok reyk, stefan


# 1.5 02-Sep-2016 stefan

Process incoming host->guest packets asynchronously to running VCPU

This registers a handler with libevent that is called on incoming packets
for the guest. If they cannot be handled immediately (because the virtq is
full), make sure they are handled on VCPU exits.

ok mlarkin@


Revision tags: OPENBSD_6_0_BASE
# 1.4 09-Jul-2016 stefan

Prepare vionet to be handled asynchronously to the VCPU thread

This splits the handling of received data into a separate function
that can later be called in parallel to the VCPU thread instead of
handling received packets on VCPU exits only.

It also makes virtq accesses in the rx path safe to run in parallel
to the VCPU thread: the last index into the 'avail' ring the driver
has notified to the host is kept track of. It also makes sure that
the host only writes back to the 'avail' ring instead of modifying
the whole receive virtq.

While there, describe what virtio_vq_info and virtio_io_cfg are used
for, as suggested by mlarkin@

ok mlarkin@


Revision tags: OPENBSD_5_9_BASE
# 1.3 03-Dec-2015 reyk

spacing


# 1.2 22-Nov-2015 reyk

Add $ Ids


# 1.1 22-Nov-2015 mlarkin

vmd(8) - virtual machine daemon.

There is still a lot to be done, and fixed, in these userland components
but I have received enough "it works, commit it" emails that it's time
to finish those things in tree.

discussed with many, tested by many.