History log of /freebsd-current/usr.sbin/bhyve/pci_passthru.c
Revision Date Author Comments
# 4d65a7c6 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

usr.sbin: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 01d53c34 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Improve pcifd function naming

read_config() and write_config() are externally visible, so give them
more descriptive names. No functional change intended.

MFC after: 1 week
Sponsored by: Innovate UK


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 90c3a1b6 09-May-2023 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add empty GVT-d emulation

Don't emulate anything yet. Just check if the user would like to pass an
Intel GPU to the guest.

Reviewed by: jhb, markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40038


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 93cf9317 09-May-2023 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add helper for passthru specific mmio ranges

Intel GPUs have two special memory regions. They are called Graphics
Stolen Memory and OpRegion. bhyve has to emulate both of them. In order
to keep track of those special regions, add generic mmio ranges to the
passthru emulation.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40036


# 60793cee 09-May-2023 Corvin Köhne <corvink@FreeBSD.org>

bhyve: make passthru sel public available

The GVT-d emulation requires access to this selector to read from the
device.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40035


# b6e67875 07-Sep-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add hook for PCI header of passthru devices

Most register of the PCI header are either constant values or require
emulation anyway. The command and status register are the only exception which
require hardware access. So, we're adding an emulation handler for all
other register.

As this emulation handler will be reused by some future features like
GPU passthrough, we directly export it.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33010


# 931bb7bf 19-Mar-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: define array to protect passthru regs

GPU passthrough requires a special handling of some PCI config register.
Therefore, we need a flexible approach for implementing it. Adding an
array of handler meets this condition.

Start by using the default handler for all accesses to the PCI config
space. In upcoming commits, we can start to split the default handler
into several handler for each register that requires emulation.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39291


# 7d9ef309 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

libvmmapi: Add a struct vcpu and use it in most APIs.

This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124


# 6a284cac 19-Jan-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096


# 78c2cd83 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove unused vcpu argument from PCI read/write methods.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37652


# 0857e555 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Pass a vCPU ID of 0 to vm_setup_pptdev_msi*.

These ioctls are not vCPU-specific and the ioctl now ignores the vCPU
ID. 0 is used instead of -1 to provide limited forwards
compatibility.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37651


# 32b21dd2 28-Nov-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Appease warning about a potentially unaligned pointer.

When initializing the device model for a PCI pass through device that
uses MSI-X, bhyve reads the MSI-X capability from the real device to
save a copy in the emulated PCI config space. It also saves a copy in
a local struct msixcap on the stack. Since struct msixcap is packed,
GCC complains that casting a pointer to the struct to a uint32_t
pointer may result in an unaligned pointer.

This path is not performance critical, so to appease the compiler,
simply change the pointer to a char * and use memcpy to copy the 4
bytes read in each iteration of the loop.

Reviewed by: corvink, bz, markj
Differential Revision: https://reviews.freebsd.org/D37490


# ed721684 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some signed/unsigned comparison warnings

MFC after: 1 week


# 63898728 22-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Avoid arithmetic on void pointers

No functional change intended.

MFC after: 1 week


# 98d920d9 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate unused function parameters

MFC after: 1 week


# baf753cc 19-Aug-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Support other schemes for naming pass-through devices.

Permit naming pass through devices using the syntax accepted by
pciconf (pci[<domain>:]<bus>:<slot>:<func>) as well as by device name
(e.g. "ppt0").

While here, fix an error in the manpage that had the bus and slot
arguments for the original /-delimited scheme swapped.

Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36147


# bcab868a 17-Aug-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Style fix for read/write_config.


# 37045dfa 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Mark variables and functions as static where appropriate

Mark them const as well when it makes sense to do so. No functional
change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 75ce327a 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Use "void" instead of empty parameter lists

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 50526f52 27-Jul-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: fix spelling mistake in passthru emulation

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35707
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 3256b7ca 01-Apr-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: avoid an empty passthru config value

pci_parse_legacy_config splits the options string by comma characters.
strchr returns a pointer to the first occurence of a character. In that
case, it's a comma. So, pci_parse_legacy_config will stop at the first
character and creates a new config node with a name of NULL.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34600


# e47fe318 10-Mar-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: add ROM emulation

Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).

It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.

Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month


# 563fd224 10-Mar-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: export funcs for read/write pci config

Export functions for reading and writing the pci config space from passthru
device to be used by other devices.
This is required for lpc devices to set their vendor/device ids to their
physical values.
Otherwise, GPU passthrough for integrated Intel GPUs won't work properly.

Differential Revision: https://reviews.freebsd.org/D33769
Reviewed by: markj
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month


# 4558c11f 05-Jan-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Correct unmapping of the MSI-X table BAR

The starting address passed to mprotect was wrong, so in the case where
the last page containing the table is not the last page of the BAR, the
wrong region would be unmapped.

Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33739


# 76b45e68 04-Jan-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Map the right BAR in init_msix_table()

The PBA and MSI-X table can reside in different BARs.

Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33739


# 338a1be8 03-Jan-2022 Corvin Köhne <C.Koehne@beckhoff.com>

bhyve: only init MSI-X table if passthru device supports it

Some passthru devices only support MSI instead of MSI-X. For those
devices the initialization of MSI-X table will fail. Re-add the
check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9.

MFC after: 3 days
X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9
PR: 260148
Reviewed by: manu, bz
Differential Revision: https://reviews.freebsd.org/D33728


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# f1442847 23-Dec-2021 Bjoern A. Zeeb <bz@FreeBSD.org>

bhyve: passthru: enable BARs before possibly mmap(2)ing them

The first time we start bhyve with a passthru device everything is fine
as on boot we do enable BARs. If a driver (unload) inside bhyve disables
the BAR(s) as some Linux drivers do, we need to make sure we re-enable
them on next bhyve start.

If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP)
the kernel will give us an EBUSY.
While we were re-enabling the BAR(s) in the current code loop
cfginit() was writing the changes out too late to the real hardware.

Move the call to init_msix_table() after the register on the real
hardware was updated. That way the kernel will be happy and the
mmap will succeed and bhyve will start.
Also simplify the code given the last argument to init_msix_table()
is unused we do not need to do checks for each bar. [1]

MFC after: 3 days
PR: 260148
Pointed out by: markj [1]
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33628


# fe66bcf9 22-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: emulate reads of MSI-X capabilities for passthru devices

Reads of the MSI-X capabilites aren't emulated by passthru devices
yet. The guest will read the host MSI-X capabilites which could
cause issues.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D32686
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 2eb20795 22-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: keep physical and virtual COMMAND reg in sync

On startup all virtual BARs are registered.
Additionally, the encoding bit in the virtual cmd register is set.
After that, the passthru emulation overwrites the virtual cmd register with
the physical one.
This could lead to a mismatch between registered BARs and the encoding
bits in the cmd register.
Instead of writing the physical to the virtual cmd register,
write the virtual to the physical cmd register to solve this issue.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D32687
Sponsored by: Beckhoff Automation GmbH & Co. KG


# e87a6f3e 18-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: use physical lobits for BARs of passthru devices

Tell the guest whether a BAR uses prefetched memory or not for
passthru devices by using the same lobits as the physical device.

Reviewed by: grehan
Sponsored by: Beckhoff Autmation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D32685


# 7fa23353 09-Oct-2021 Mark Johnston <markj@FreeBSD.org>

bhyve: Map the MSI-X table unconditionally for passthrough

It is possible for the PBA to reside in the same page as the MSI-X
table. And, while devices are not supposed to do this, at least some
Intel wifi devices place registers in a page shared with the MSI-X
table. To handle the first case we currently map the PBA page using
/dev/mem, and the second case is not handled.

Kill two birds with one stone: map the MSI-X table BAR using the
PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that
accesses beyond the bounds of the table can be emulated. Regions of the
BAR not containing the table are left unmapped.

Reviewed by: bz, grehan, jhb
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32359


# 42375556 14-Aug-2021 Mark Johnston <markj@FreeBSD.org>

bhyve: Use pci(4) to access I/O port BARs

This removes the dependency on /dev/io.

PR: 251046
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31308


# f8a6ec2d 18-Mar-2021 D Scott Phillips <scottph@FreeBSD.org>

bhyve: support relocating fbuf and passthru data BARs

We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 1925586e 24-Nov-2020 John Baldwin <jhb@FreeBSD.org>

Honor the disabled setting for MSI-X interrupts for passthrough devices.

Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212


# 038f5c7b 11-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: remove a hack to map all 8G BARs 1:1

Suggested and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27186


# 21368498 25-May-2020 Peter Grehan <grehan@FreeBSD.org>

Fix pci-passthru MSI issues with OpenBSD guests

- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.

- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.

Reported and extensively tested by Anatoli (me at anatoli dot ws)

PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951


# c7ba149d 11-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Emulate reads of the PCI command register for passthrough devices.

VFs return zero for the memory enable bit even if it has been set by a
prior write. After r348779 this caused the annoying behavior that a
guest OS would unintentionally disable memory decoding on a future
read-modify-write operation on the command register. Instead, return
the shadow value of the command register for reads. This ensures that
the guest will only toggle the state of the memory enable bit when it
specifically intends to do so.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# dbb15211 12-Jul-2019 Sean Chittenden <seanc@FreeBSD.org>

usr.sbin/bhyve: only unassign a pt device after obtaining bus/slot/func

Coverity CID: 1194302, 1194303, 1194304
Approved by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D20933


# 56282675 07-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Keep the shadow PCIR_COMMAND synced with the real one for pass through.

This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.

Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531


# 24be3f51 05-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Don't simulate PBA access if the PBA is in a separate BAR.

bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to. For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.

bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table. However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.

However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer. Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.

Reported by: gallatin
Tested by: gallatin
Reviewed by: markj, Patrick Mooney
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20523


# abfa3c39 15-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 007e172d 26-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

We need CAP_MMAP_RW on memfd, since init_msix_table() may call mmap().


# 00ef17be 14-Feb-2017 Bartek Rutkowski <robak@FreeBSD.org>

Capsicum support for bhyve(8).

Adds Capsicum sandboxing to bhyve.

Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, oshogbo
Approved by: emaste, grehan
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D8290


# 98e21e80 05-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Fix gcc warnings

Remove -Wunused-but-set-variable (`error`). Cast calls with
`(void)` to note that the return value is explicitly ignored.

Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7119
MFC after: 1 week
Reported by: Jenkins
Reviewed by: grehan (maintainer)
Sponsored by: EMC / Isilon Storage Division


# cff92ffd 19-Apr-2016 John Baldwin <jhb@FreeBSD.org>

Always emit an error message on passthru configuration errors.

Previously, many errors (such as the PCI device not being attached
to the ppt(4) driver) resulted in bhyve silently exiting without
starting the virtual machine. Now any errors encountered when
configuring a virtual slot for a PCI passthru device should be noted
on stderr.

Reviewed by: neel
Differential Revision: https://reviews.freebsd.org/D5990


# 5c40acf8 13-Apr-2016 John Baldwin <jhb@FreeBSD.org>

Handle PBA that shares a page with MSI-X table for passthrough devices.

If the PBA shares a page with the MSI-X table, map the shared page via
/dev/mem and emulate accesses to the portion of the PBA in the shared
page by accessing the mapped page.

Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5919


# 9b1aa8d6 18-Jun-2015 Neel Natu <neel@FreeBSD.org>

Restructure memory allocation in bhyve to support "devmem".

devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks


# 994f858a 22-Apr-2014 Xin LI <delphij@FreeBSD.org>

Use calloc() in favor of malloc + memset.

Reviewed by: neel


# 7a902ec0 18-Feb-2014 Neel Natu <neel@FreeBSD.org>

Add a check to validate that memory BARs of passthru devices are 4KB aligned.

Also, the MSI-x table offset is not required to be 4KB aligned so take this
into account when computing the pages occupied by the MSI-x tables.


# 55888cfa 17-Dec-2013 Neel Natu <neel@FreeBSD.org>

Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to
'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively.

It should now be clear that these functions operate on passthru devices.


# 4f8be175 16-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add an API to deliver message signalled interrupts to vcpus. This allows
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by: grehan@


# 4b5e84f6 11-Mar-2013 Neel Natu <neel@FreeBSD.org>

Convert the offset into the bar that contains the MSI-X table to an offset
into the MSI-X table before using it to calculate the table index.

In the common case where the MSI-X table is located at the begining of the
BAR these two offsets are identical and thus the code was working by accident.

This change will fix the case where the MSI-X table is located in the middle
or at the end of the BAR that contains it.

Obtained from: NetApp


# 2b89a044 31-Jan-2013 Neel Natu <neel@FreeBSD.org>

Fix a broken assumption in the passthru implementation that the MSI-X table
can only be located at the beginning or the end of the BAR.

If the MSI-table is located in the middle of a BAR then we will split the
BAR into two and create two mappings - one before the table and one after
the table - leaving a hole in place of the table so accesses to it can be
trapped and emulated.

Obtained from: NetApp


# aa12663f 31-Jan-2013 Neel Natu <neel@FreeBSD.org>

Fix a bug in the passthru implementation where it would assume that all
devices are MSI-X capable. This in turn would lead it to treat bar 0 as
the MSI-X table bar even if the underlying device did not support MSI-X.

Fix this by providing an API to query the MSI-X table index of the emulated
device. If the underlying device does not support MSI-X then this API will
return -1.

Obtained from: NetApp


# 2e81a7e8 21-Jan-2013 Neel Natu <neel@FreeBSD.org>

Allocate the memory for the MSI-X table dynamically instead of allocating 32KB
statically. In most cases the number of table entries will be far less than
the maximum of 2048 allowed by the PCI specification.

Reuse macros from pcireg.h to interpret the MSI-X capability instead of rolling
our own.

Obtained from: NetApp


# c3cbaac9 21-Jan-2013 Neel Natu <neel@FreeBSD.org>

Get rid of redundant 'table_size' field in struct pi_msix. If needed it can
always be calculated from the number of entries in the MSI-X table.

Obtained from: NetApp


# ba9b7bf7 27-Nov-2012 Neel Natu <neel@FreeBSD.org>

Revamp the x86 instruction emulation in bhyve.

On a nested page table fault the hypervisor will:
- fetch the instruction using the guest %rip and %cr3
- decode the instruction in 'struct vie'
- emulate the instruction in host kernel context for local apic accesses
- any other type of mmio access is punted up to user-space (e.g. ioapic)

The decoded instruction is passed as collateral to the user-space process
that is handling the PAGING exit.

The emulation code is fleshed out to include more addressing modes (e.g. SIB)
and more types of operands (e.g. imm8). The source code is unified into a
single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well
as /usr/sbin/bhyve.

Reviewed by: grehan
Obtained from: NetApp


# a07896de 21-Nov-2012 Neel Natu <neel@FreeBSD.org>

MSI-X does not need to be enabled in the message control register for the
guest to access the MSI-x tables.

Obtained from: NetApp


# 4d1e669c 19-Oct-2012 Peter Grehan <grehan@FreeBSD.org>

Rework how guest MMIO regions are dealt with.

- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.

- Support memory-mapped BARs in PCI emulation.

mem.c/h - memory region interface

instruction_emul.c/h - remove old region interface.
Use gpa from EPT exit to avoid a tablewalk to
determine operand address. Determine operand size
and use when calling through to region handler.

fbsdrun.c - call into region interface on paging
exit. Distinguish between instruction emul error
and region not found

pci_emul.c/h - implement new BAR callback api.
Split BAR alloc routine into routines that
require/don't require the BAR phys address.

ioapic.c
pci_passthru.c
pci_virtio_block.c
pci_virtio_net.c
pci_uart.c - update to new BAR callback i/f

Reviewed by: neel
Obtained from: NetApp


# cd942e0f 28-Apr-2012 Peter Grehan <grehan@FreeBSD.org>

MSI-x interrupt support for PCI pass-thru devices.

Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.

Submitted by: ryan dot berryhill at sandvine dot com
Reviewed by: grehan
Obtained from: Sandvine


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski