#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
95ee2897 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
#
e17eca32 |
|
23-May-2023 |
Mark Johnston <markj@FreeBSD.org> |
vmm: Avoid embedding cpuset_t ioctl ABIs Commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by userspace") embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since we otherwise have some leeway to change the cpuset_t for the whole system, but we want to keep the vmm ioctl ABI stable. Rework IPI reporting to avoid this problem. Along the way, make VM_RUN a bit more efficient: - Split vmexit metadata out of the main VM_RUN structure. This data is only written by the kernel. - Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN structure, as is done for cpuset syscalls. - Have the destination CPU mask for VM_EXITCODE_IPIs live outside the vmexit info structure, and make VM_RUN copy it out separately. Zero out any extra bytes in the CPU mask, like cpuset syscalls do. - Modify the vmexit handler prototype to take a full VM_RUN structure. PR: 271330 Reviewed by: corvink, jhb (previous versions) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40113
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
94a3876d |
|
17-Mar-2023 |
Vitaliy Gusev <gusev.vitaliy@gmail.com> |
vmm: fix missing ipi statistic ipi counters are missing in bhyvectl's output because vm_maxcpu is 0 when initializing them. That's because vmm_stat_register is executed before vmm_init. Instead of directly fixing it, there's a better solution in illumos which is cherry picked: https://github.com/illumos/illumos-gate/commit/65a3bc83734e5fb0fc2c19df3e5112b87dcdc3f8 It replaces the matrix statistic by two counters per vcpu. One for counting the ipis to the vcpu and one counting the ipis received by the vcpu. This has several advantages: - A matrix statistic becomes huge when using many vcpus. - A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit. - Two counters are enough in most cases. DTrace can be used for more advanced debugging purposes. - A matrix statistic wastes memory. The matrix size is determined by vm_maxcpu regardless of the number of vcpus assigned to the vm. Reviewed by: corvink, markj Fixes: ee98f99d7a68b284a669fefb969cbfc31df2d0ab ("vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.") MFC after: 1 week Sponsored by: vStack Differential Revision: https://reviews.freebsd.org/D39038
|
#
b265a2e0 |
|
09-Feb-2023 |
Mark Johnston <markj@FreeBSD.org> |
vmm: Fix AP startup compatibility for old bhyve executables These changes unbreak AP startup when using a 13.1-RELEASE bhyve executable with a newer kernel: - Correct the destination mask for the VM_EXITCODE_IPI message generated by an INIT or STARTUP IPI in vlapic_icrlo_write_handler(). - Only initialize vlapics on active vCPUs. 13.1-RELEASE bhyve activates AP vCPUs only after the BSP starts them with an IPI, and vmm now allocates vcpu structures lazily, so the STARTUP handling in vm_handle_ipi() could trigger a page fault. - Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP message. Fixes: 7c326ab5bb9a ("vmm: don't lock a mtx in the icr_low write handler") Reviewed by: jhb, corvink MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38446
|
#
f3bbd0e8 |
|
09-Feb-2023 |
Mark Johnston <markj@FreeBSD.org> |
vmm: Collapse identical case statements in vlapic_icrlo_write_handler() No functional change intended. Reviewed by: jhb, corvink MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38446
|
#
7c326ab5 |
|
21-Nov-2022 |
Corvin Köhne <corvink@FreeBSD.org> |
vmm: don't lock a mtx in the icr_low write handler x2apic accesses are handled by a wrmsr exit. This handler is called in a critical section. So, we can't lock a mtx in the icr_low handler. Reported by: kp, pho Tested by: kp, pho Approved by: manu (mentor) Fixes: c0f35dbf19c3c8825bd2b321d8efd582807d1940 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs. MFC after: 1 week MFC with: c0f35dbf19c3c8825bd2b321d8efd582807d1940 Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D37452
|
#
98568a00 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Allocate vCPUs on first use of a vCPU. Convert the vcpu[] array in struct vm to an array of pointers and allocate vCPUs on first use. This avoids always allocating VM_MAXCPU vCPUs for each VM, but instead only allocates the vCPUs in use. A new per-VM sx lock is added to serialize attempts to allocate vCPUs on first use. However, a given vCPU is never freed while the VM is active, so the pointer is read via an unlocked read first to avoid the need for the lock in the common case once the vCPU has been created. Some ioctls need to lock all vCPUs. To prevent races with ioctls that want to allocate a new vCPU, these ioctls also lock the sx lock that protects vCPU creation. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37174
|
#
c0f35dbf |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs. Retire the boot_state member of struct vlapic and instead use a cpuset in the VM to track vCPUs waiting for STARTUP IPIs. INIT IPIs add vCPUs to this set, and STARTUP IPIs remove vCPUs from the set. STARTUP IPIs are only reported to userland for vCPUs that were removed from the set. In particular, this permits a subsequent change to allocate vCPUs on demand when the vCPU may not be allocated until after a STARTUP IPI is reported to userland. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37173
|
#
08ebb360 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Destroy mutexes. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37171
|
#
d5118d0f |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm stat: Add a special nelems constant for arrays sized by vCPU count. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37170
|
#
3f0f4b15 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Lookup vcpu pointers in vmmdev_ioctl. Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and pass vcpu pointers to the routines in vmm.c. For operations that want to perform an action on all vCPUs or on a single vCPU, pass pointers to both the VM and the vCPU using a NULL vCPU pointer to request global actions. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37168
|
#
e42c24d5 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Remove unused vcpuid argument from vioapic_process_eoi. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37166
|
#
d8be3d52 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use struct vcpu in the rendezvous code. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37165
|
#
80cb5d84 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37162
|
#
d3956e46 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use struct vcpu in the instruction emulation code. This passes struct vcpu down in place of struct vm and and integer vcpu index through the in-kernel instruction emulation code. To minimize userland disruption, helper macros are used for the vCPU arguments passed into and through the shared instruction emulation code. A few other APIs used by the instruction emulation code have also been updated to accept struct vcpu in the kernel including vm_get/set_register and vm_inject_fault. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37161
|
#
3dc3d32a |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use struct vcpu with the vmm_stat API. The function callbacks still use struct vm and and vCPU index. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37157
|
#
d030f941 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use VLAPIC_CTR* in more places. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37155
|
#
35abc6c2 |
|
18-Nov-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Use vm_get_maxcpus() instead of VM_MAXCPU in various places. Mostly these are loops that iterate over all possible vCPU IDs for a specific virtual machine. Reviewed by: corvink, markj Differential Revision: https://reviews.freebsd.org/D37147
|
#
769b884e |
|
26-Oct-2022 |
John Baldwin <jhb@FreeBSD.org> |
vmm: Fix AP startup with old userspace binaries. Older binaries that do not request IPI exits to userspace do not start user threads for other vCPUs until a STARTUP IPI triggers a VM_EXITCODE_SPINUP_AP exit to userland. This means that those vcpus are not yet active (in terms of vm_active_cpus) when the INIT and STARTUP IPIs are delivered to the vCPUs. The changes in commit 0bda8d3e9f7a changed the INIT and STARTUP IPIs to reuse the existing vlapic_calcdest() function. This function silently ignores IPIs sent to inactive vCPUs. As a result, when using an old bhyve binary, the INIT and STARTUP IPIs sent to wakeup APs were ignored. To fix, restructure the compat code for the INIT and STARTUP IPIs to ignore the results of vlapic_calcdest() and manually parse the APIC ID and resulting vcpuid. As part of this, make the compat code always conditonal on the ipi_exit capability being disabled. Reviewed by: c.koehne_beckhoff.com, markj Differential Revision: https://reviews.freebsd.org/D37093
|
#
2a2a64c4 |
|
12-Oct-2022 |
Corvin Köhne <c.koehne@beckhoff.com> |
vmm: validate icr value Not all combinations of icr values are allowed. Neither Intel nor AMD document what happens when an invalid value is written to the icr. Ignore the IPI. So, the guest will note that the IPI wasn't delivered. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D36946 Sponsored by: Beckhoff Automation GmbH & Co. KG
|
#
f56801d6 |
|
10-Oct-2022 |
Corvin Köhne <c.koehne@beckhoff.com> |
vmm: increase vlapic version Mac os panics on apic versions lower than 0x14. See https://opensource.apple.com/source/xnu/xnu-7195.81.3/osfmk/i386/lapic_native.c.auto.html Additionally, an upcoming commit will validate the icr values written by the guest. Older intel processors allow some different combinations than the newer ones. AMD documents that only the newer combinations are allowed. So, bumping the version allows us to avoid a differentiation between AMD and Intel. Intel documents that newer processors than the P6 are using the new combinations. Sadly, Intel does not document which apic version belongs to those processors. Linux identifies newer apics by a version larger or equal to 0x14. Intel and AMD allow apic version between 0x10 and 0x15. So, using 0x14 seems to be fine. See https://github.com/torvalds/linux/blob/3eba620e7bd772a0c7dc91966cb107872b54a910/arch/x86/kernel/apic/apic.c#L238 Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D36945 Sponsored by: Beckhoff Automation GmbH & Co. KG
|
#
0bda8d3e |
|
07-Sep-2022 |
Corvin Köhne <CorvinK@beckhoff.com> |
vmm: permit some IPIs to be handled by userspace Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and STARTUP IPIs are now returned to userland. Due to backward compatibility reasons, a new capability is added for enabling VM_EXITCODE_IPI. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D35623 Sponsored by: Beckhoff Automation GmbH & Co. KG
|
#
3fc17484 |
|
09-Sep-2022 |
Emmanuel Vadot <manu@FreeBSD.org> |
Revert "vmm: permit some IPIs to be handled by userspace" This reverts commit a5a918b7a906eaa88e0833eac70a15989d535b02. This cause some problem with vm using bhyveload. Reported by: pho, kp
|
#
83b65d0a |
|
09-Sep-2022 |
Emmanuel Vadot <manu@FreeBSD.org> |
Revert "vmm: Remove unneeded variable maxcpus" This reverts commit 653c36179d9ee587e4d5e4668fd73d6c3d318ef8.
|
#
653c3617 |
|
07-Sep-2022 |
Emmanuel Vadot <manu@FreeBSD.org> |
vmm: Remove unneeded variable maxcpus Reported by: FreeBSD User <freebsd@walstatt-de.de> Fixes: a5a918b7a906 ("vmm: permit some IPIs to be handled by userspace")
|
#
a5a918b7 |
|
07-Sep-2022 |
Corvin Köhne <CorvinK@beckhoff.com> |
vmm: permit some IPIs to be handled by userspace Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and Startup IPIs are now returned to userland. Due to backward compatibility reasons, a new capability is added for enabling VM_EXITCODE_IPI. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35623 Sponsored by: Beckhoff Automation GmbH & Co. KG
|
#
2062ce99 |
|
28-Feb-2022 |
Robert Wing <rew@FreeBSD.org> |
vmm: fix "set but not used" warnings
|
#
c72e914c |
|
11-Jan-2022 |
Vitaliy Gusev <gusev.vitaliy@gmail.com> |
vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448
|
#
4c812fe6 |
|
19-Oct-2021 |
Mark Johnston <markj@FreeBSD.org> |
vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559
|
#
de855429 |
|
21-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. This is a re-application of commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029
|
#
bcdc599d |
|
21-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it" This reverts commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e. The underlying macro needs to be reworked to avoid problems with control flow statements. Reported by: rlibby
|
#
9068f6ea |
|
21-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029
|
#
543769bf |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: clean up empty lines in .c and .h files
|
#
483d953a |
|
04-May-2020 |
John Baldwin <jhb@FreeBSD.org> |
Initial support for bhyve save and restore. Save and restore (also known as suspend and resume) permits a snapshot to be taken of a guest's state that can later be resumed. In the current implementation, bhyve(8) creates a UNIX domain socket that is used by bhyvectl(8) to send a request to save a snapshot (and optionally exit after the snapshot has been taken). A snapshot currently consists of two files: the first holds a copy of guest RAM, and the second file holds other guest state such as vCPU register values and device model state. To resume a guest, bhyve(8) must be started with a matching pair of command line arguments to instantiate the same set of device models as well as a pointer to the saved snapshot. While the current implementation is useful for several uses cases, it has a few limitations. The file format for saving the guest state is tied to the ABI of internal bhyve structures and is not self-describing (in that it does not communicate the set of device models present in the system). In addition, the state saved for some device models closely matches the internal data structures which might prove a challenge for compatibility of snapshot files across a range of bhyve versions. The file format also does not currently support versioning of individual chunks of state. As a result, the current file format is not a fixed binary format and future revisions to save and restore will break binary compatiblity of snapshot files. The goal is to move to a more flexible format that adds versioning, etc. and at that point to commit to providing a reasonable level of compatibility. As a result, the current implementation is not enabled by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option for userland builds, and the kernel option BHYVE_SHAPSHOT. Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz Relnotes: yes Sponsored by: University Politehnica of Bucharest Sponsored by: Matthew Grooms (student scholarships) Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D19495
|
#
1bc51bad |
|
10-Mar-2020 |
Michael Reifenberger <mr@FreeBSD.org> |
Untangle TPR shadowing and APIC virtualization. This speeds up Windows guests tremendously. The patch does: Add a new tuneable 'hw.vmm.vmx.use_tpr_shadowing' to disable TLP shadowing. Also add 'hw.vmm.vmx.cap.tpr_shadowing' to be able to query if TPR shadowing is used. Detach the initialization of TPR shadowing from the initialization of APIC virtualization. APIC virtualization still needs TPR shadowing, but not vice versa. Any CPU that supports APIC virtualization should also support TPR shadowing. When TPR shadowing is used, the APIC page of each vCPU is written to the VMCS_VIRTUAL_APIC field of the VMCS so that the CPU can write directly to the page without intercept. On vm exit, vlapic_update_ppr() is called to update the PPR. Submitted by: Yamagi Burmeister MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22942
|
#
6a1e1c2c |
|
29-Aug-2019 |
John Baldwin <jhb@FreeBSD.org> |
Simplify bhyve vlapic ESR logic. The bhyve virtual local APIC uses an instance-global flag to indicate when an error LVT is being delivered to prevent infinite recursion. Use a function argument instead to reduce the amount of instance-global state. This was inspired by reviewing the bhyve save/restore work, which saves a copy of the instance-global state for each vlapic. Smart OS bug: https://smartos.org/bugview/OS-7777 Submitted by: Patrick Mooney Reviewed by: markj, rgrimes Obtained from: SmartOS / Joyent Differential Revision: https://reviews.freebsd.org/D20365
|
#
ba084c18 |
|
13-Aug-2019 |
Ed Maste <emaste@FreeBSD.org> |
sys/{x86,amd64}: remove one of doubled ;s MFC after: 1 week
|
#
e5506316 |
|
03-Aug-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
bhyve: Ignore MSI/MSI-X interrupts sent to non-active vCPUs in physical destination mode. This is mostly a nop, because the vmm initializes all vCPUs up to vm_maxcpus, so even if the target CPU is not active, lapic/vlapic code still has the valid data to use. As John notes, dropping such interrupts more closely matches the real harware, which ignores all interrupts for not started APs. Reviewed by: jhb admbugs: 837 MFC after: 1 week Sponsored by: The FreeBSD Foundation
|
#
a488c9c9 |
|
25-Apr-2019 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Add accessor function for vm->maxcpus Replace most VM_MAXCPU constant useses with an accessor function to vm->maxcpus which for now is initialized and kept at the value of VM_MAXCPUS. This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de) work from D10070 to adjust it for the cpu topology changes that occured in r332298 Submitted by: Fabian Freyer (fabian.freyer_physik.tu-berlin.de) Reviewed by: Patrick Mooney <patrick.mooney@joyent.com> Approved by: bde (mentor), jhb (maintainer) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18755
|
#
c49761dd |
|
27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/amd64: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
|
#
28323add |
|
08-Nov-2016 |
Bryan Drewery <bdrewery@FreeBSD.org> |
Fix improper use of "its". Sponsored by: Dell EMC Isilon
|
#
500eb14a |
|
03-May-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
vmm(4): Small spelling fixes. Reviewed by: grehan
|
#
44e2f0fe |
|
01-May-2015 |
Neel Natu <neel@FreeBSD.org> |
r281630 relaxed the limits on the vectors that can be asserted in the IRRs. Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
|
#
18a2b08e |
|
13-Mar-2015 |
Neel Natu <neel@FreeBSD.org> |
Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib
|
#
9d8d8e3e |
|
19-Sep-2014 |
Neel Natu <neel@FreeBSD.org> |
Add some more KTR events to help debugging.
|
#
79ad53fb |
|
15-Sep-2014 |
Neel Natu <neel@FreeBSD.org> |
Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)
|
#
051f2bd1 |
|
09-Jun-2014 |
Neel Natu <neel@FreeBSD.org> |
Add reserved bit checking when doing %CR8 emulation and inject #GP if required. Pointed out by: grehan Reviewed by: tychon
|
#
594db002 |
|
06-Jun-2014 |
Tycho Nightingale <tychon@FreeBSD.org> |
Support guest accesses to %cr8. Reviewed by: neel
|
#
95ebc360 |
|
31-May-2014 |
Neel Natu <neel@FreeBSD.org> |
Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.
|
#
c5d216b7 |
|
23-Apr-2014 |
Neel Natu <neel@FreeBSD.org> |
Change the vlapic timer frequency to be in the ballpark of contemporary hardware. This also decouples the vlapic emulation from the host's TSC frequency. Requested by: grehan@
|
#
0775fbb4 |
|
15-Mar-2014 |
Tycho Nightingale <tychon@FreeBSD.org> |
Fix a race wherein the source of an interrupt vector is wrongly attributed if an ExtINT arrives during interrupt injection. Also, fix a spurious interrupt if the PIC tries to raise an interrupt before the outstanding one is accepted. Finally, improve the PIC interrupt latency when another interrupt is raised immediately after the outstanding one is accepted by creating a vmexit rather than waiting for one to occur by happenstance. Approved by: neel (co-mentor)
|
#
1ed19b83 |
|
11-Mar-2014 |
Tycho Nightingale <tychon@FreeBSD.org> |
Don't try to return a vector to a caller that only cares if a vector is pending or not. Approved by: neel (co-mentor)
|
#
762fd208 |
|
11-Mar-2014 |
Tycho Nightingale <tychon@FreeBSD.org> |
Replace the userspace atpic stub with a more functional vmm.ko model. New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)
|
#
159dd56f |
|
20-Feb-2014 |
Neel Natu <neel@FreeBSD.org> |
Add support for x2APIC virtualization assist in Intel VT-x. The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode is switched to x2APIC. The VT-x implementation of this handler turns off the APIC-access virtualization and enables the x2APIC virtualization in the VMCS. The x2APIC virtualization is done by allowing guest read access to a subset of MSRs in the x2APIC range. In non-root operation the processor will satisfy an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead. The guest is also given write access to TPR, EOI and SELF_IPI MSRs which get special treatment in non-root operation. This is documented in the Intel SDM section titled "Virtualizing MSR-Based APIC Accesses". Enforce that APIC-write and APIC-access VM-exits are handled only if APIC-access virtualization is enabled. The one exception to this is SELF_IPI virtualization which may result in an APIC-write VM-exit.
|
#
52e5c8a2 |
|
19-Feb-2014 |
Neel Natu <neel@FreeBSD.org> |
Simplify APIC mode switching from MMIO to x2APIC. In part this is done to simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@
|
#
294d0d88 |
|
17-Feb-2014 |
Neel Natu <neel@FreeBSD.org> |
Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured in x2apic mode. Reads to this MSR are currently ignored but should cause a general proctection exception to be injected into the vcpu. All accesses to the corresponding offset in xAPIC mode are ignored. Also, do not panic the host if there is mismatch between the trigger mode programmed in the TMR and the actual interrupt being delivered. Instead the anomaly is logged to aid debugging and to prevent a misbehaving guest from panicking the host.
|
#
e9ed7bc4 |
|
03-Feb-2014 |
Peter Grehan <grehan@FreeBSD.org> |
Roll back botched partial MFC :(
|
#
30b94db8 |
|
25-Jan-2014 |
Neel Natu <neel@FreeBSD.org> |
Support level triggered interrupts with VT-x virtual interrupt delivery. The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector. If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger an EOI-induced VM-exit when it is doing EOI virtualization. The EOI-induced VM-exit results in the EOI being forwarded to the vioapic so that level triggered interrupts can be properly handled. Tested by: Anish Gupta (akgupt3@gmail.com)
|
#
5b8a8cd1 |
|
13-Jan-2014 |
Neel Natu <neel@FreeBSD.org> |
Add an API to rendezvous all active vcpus in a virtual machine. The rendezvous can be initiated in the context of a vcpu thread or from the bhyve(8) control process. The first use of this functionality is to update the vlapic trigger-mode register when the IOAPIC pin configuration is changed. Prior to this change we would update the TMR in the virtual-APIC page at the time of interrupt delivery. But this doesn't work with Posted Interrupts because there is no way to program the EOI_exit_bitmap[] in the VMCS of the target at the time of interrupt delivery. Discussed with: grehan@
|
#
add611fd |
|
08-Jan-2014 |
Neel Natu <neel@FreeBSD.org> |
Don't expose 'vmm_ipinum' as a global.
|
#
88c4b8d1 |
|
07-Jan-2014 |
Neel Natu <neel@FreeBSD.org> |
Use the 'Virtual Interrupt Delivery' feature of Intel VT-x if supported by hardware. It is possible to turn this feature off and fall back to software emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0. We now start handling two new types of VM-exits: APIC-access: This is a fault-like VM-exit and is triggered when the APIC register access is not accelerated (e.g. apic timer CCR). In response to this we do emulate the instruction that triggered the APIC-access exit. APIC-write: This is a trap-like VM-exit which does not require any instruction emulation but it does require the hypervisor to emulate the access to the specified register (e.g. icrlo register). Introduce 'vlapic_ops' which are function pointers to vector the various vlapic operations into processor-dependent code. The 'Virtual Interrupt Delivery' feature installs 'ops' for setting the IRR bits in the virtual APIC page and to return whether any interrupts are pending for this vcpu. Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.
|
#
4d1e82a8 |
|
06-Jan-2014 |
Neel Natu <neel@FreeBSD.org> |
Allow vlapic_set_intr_ready() to return a value that indicates whether or not the vcpu should be kicked to process a pending interrupt. This will be useful in the implementation of the Posted Interrupt APICv feature. Change the return value of 'vlapic_pending_intr()' to indicate whether or not an interrupt is available to be delivered to the vcpu depending on the value of the PPR. Add KTR tracepoints to debug guest IPI delivery.
|
#
7c05bc31 |
|
27-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Modify handling of writes to the vlapic LVT registers. The handler is now called after the register value is updated in the virtual APIC page. This will make it easier to handle APIC-write VM-exits with APIC register virtualization turned on. This also implies that we need to keep a snapshot of the last value written to a LVT register. We can no longer rely on the LVT registers in the APIC page to be "clean" because the guest can write anything to it before the hypervisor has had a chance to sanitize it.
|
#
fafe8844 |
|
27-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Modify handling of writes to the vlapic ICR_TIMER, DCR_TIMER, ICRLO and ESR registers. The handler is now called after the register value is updated in the virtual APIC page. This will make it easier to handle APIC-write VM-exits with APIC register virtualization turned on. We can no longer rely on the value of 'icr_timer' on the APIC page in the callout handler. With APIC register virtualization the value of 'icr_timer' will be updated by the processor in guest-context before an APIC-write VM-exit. Clear the 'delivery status' bit in the ICRLO register in the write handler. With APIC register virtualization the write happens in guest-context and we cannot prevent a (buggy) guest from setting this bit.
|
#
2c52dcd9 |
|
27-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Modify handling of write to the vlapic SVR register. The handler is now called after the register value is updated in the virtual APIC page. This will make it easier to handle APIC-write VM-exits with APIC register virtualization turned on. Additionally, mask all the LVT entries when the vlapic is software-disabled.
|
#
3f0ddc7c |
|
26-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Modify handling of writes to the vlapic ID, LDR and DFR registers. The handlers are now called after the register value is updated in the virtual APIC page. This will make it easier to handle APIC-write VM-exits with APIC register virtualization turned on. Additionally, we need to ensure that the value of these registers is always correctly reflected in the virtual APIC page, because there is no VM exit when the guest reads these registers with APIC register virtualization.
|
#
de5ea6b6 |
|
24-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
vlapic code restructuring to make it easy to support hardware-assist for APIC emulation. The vlapic initialization and cleanup is done via processor specific vmm_ops. This will allow the VT-x/SVM modules to layer any hardware-assist for APIC emulation or virtual interrupt delivery on top of the vlapic device model. Add a parameter to 'vcpu_notify_event()' to distinguish between vlapic interrupts versus other events (e.g. NMI). This provides an opportunity to use hardware-assists like Posted Interrupts (VT-x) or doorbell MSR (SVM) to deliver an interrupt to a guest without causing a VM-exit. Get rid of lapic_pending_intr() and lapic_intr_accepted() and use the vlapic_xxx() counterparts directly. Associate an 'Apic Page' with each vcpu and reference it from the 'vlapic'. The 'Apic Page' is intended to be referenced from the Intel VMCS as the 'virtual APIC page' or from the AMD VMCB as the 'vAPIC backing page'.
|
#
330baf58 |
|
23-Dec-2013 |
John Baldwin <jhb@FreeBSD.org> |
Extend the support for local interrupts on the local APIC: - Add a generic routine to trigger an LVT interrupt that supports both fixed and NMI delivery modes. - Add an ioctl and bhyvectl command to trigger local interrupts inside a guest. In particular, a global NMI similar to that raised by SERR# or PERR# can be simulated by asserting LINT1 on all vCPUs. - Extend the LVT table in the vCPU local APIC to support CMCI. - Flesh out the local APIC error reporting a bit to cache errors and report them via ESR when ESR is written to. Add support for asserting the error LVT when an error occurs. Raise illegal vector errors when attempting to signal an invalid vector for an interrupt or when sending an IPI. - Ignore writes to reserved bits in LVT entries. - Export table entries the MADT and MP Table advertising the stock x86 config of LINT0 set to ExtInt and LINT1 wired to NMI. Reviewed by: neel (earlier version)
|
#
a7835785 |
|
21-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Consolidate the virtual apic initialization in a single function: vlapic_reset()
|
#
4f8be175 |
|
16-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Add an API to deliver message signalled interrupts to vcpus. This allows callers treat the MSI 'addr' and 'data' fields as opaque and also lets bhyve implement multiple destination modes: physical, flat and clustered. Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com) Reviewed by: grehan@
|
#
a83011d2 |
|
10-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Fix typo when initializing the vlapic version register ('<<' instead of '<').
|
#
becd9849 |
|
10-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Fix x2apic support in bhyve. When the guest is bringing up the APs in the x2APIC mode a write to the ICR register will now trigger a return to userspace with an exitcode of VM_EXITCODE_SPINUP_AP. This gets SMP guests working again with x2APIC. Change the vlapic timer lock to be a spinlock because the vlapic can be accessed from within a critical section (vm run loop) when guest is using x2apic mode. Reviewed by: grehan@
|
#
fb03ca4e |
|
07-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
Use callout(9) to drive the vlapic timer instead of clocking it on each VM exit. This decouples the guest's 'hz' from the host's 'hz' setting. For e.g. it is now possible to have a guest run at 'hz=1000' while the host is at 'hz=100'. Discussed with: grehan@ Tested by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
|
#
1c052192 |
|
07-Dec-2013 |
Neel Natu <neel@FreeBSD.org> |
If a vcpu disables its local apic and then executes a 'HLT' then spin down the vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore pending interrupts in the IRR if interrupts have been disabled by the guest. The interrupt cannot be injected into the guest in any case so resuming it is futile. With this change "halt" from a Linux guest works correctly. Reviewed by: grehan@ Tested by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
|
#
b5b28fc9 |
|
27-Nov-2013 |
Neel Natu <neel@FreeBSD.org> |
Add support for level triggered interrupt pins on the vioapic. Prior to this commit level triggered interrupts would work as long as the pin was not shared among multiple interrupt sources. The vlapic now keeps track of level triggered interrupts in the trigger mode register and will forward the EOI for a level triggered interrupt to the vioapic. The vioapic in turn uses the EOI to sample the level on the pin and re-inject the vector if the pin is still asserted. The vhpet is the first consumer of level triggered interrupts and advertises that it can generate interrupts on pins 20 through 23 of the vioapic. Discussed with: grehan@
|
#
03cd0501 |
|
04-Nov-2013 |
Neel Natu <neel@FreeBSD.org> |
Remove the 'vdev' abstraction that was meant to sit on top of device models in the kernel. This abstraction was redundant because the only device emulated inside vmm.ko is the local apic and it is always at a fixed guest physical address. Discussed with: grehan
|
#
513c8d33 |
|
30-Oct-2013 |
Neel Natu <neel@FreeBSD.org> |
Rename the VMM_CTRx() family of macros to VCPU_CTRx() to highlight that these tracepoints are vcpu-specific. Add support for tracepoints that are global to the virtual machine - these tracepoints are called VM_CTRx().
|
#
1e2751dd |
|
18-Jun-2013 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
Fix a gcc warning uncovered after r251745. Reported by: Sergey V. Dyatko Reviewed by: neel
|
#
82f2974a |
|
14-Jun-2013 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
Replace cpusetffs_obj with CPU_FFS, missed in r251703. Reported by: bdrewery, O. Hartmann
|
#
0acb0d84 |
|
09-May-2013 |
Neel Natu <neel@FreeBSD.org> |
Support array-type of stats in bhyve. An array-type stat in vmm.ko is defined as follows: VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu"); It is incremented as follows: vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1); And output of 'bhyvectl --get-stats' looks like: ipis sent to vcpu[0] 3114 ipis sent to vcpu[1] 0 Reviewed by: grehan Obtained from: NetApp
|
#
117e8f37 |
|
05-Apr-2013 |
Peter Grehan <grehan@FreeBSD.org> |
Don't panic when a valid divisor of 1 has been requested. Obtained from: NetApp
|
#
77d8fd9b |
|
30-Mar-2013 |
Neel Natu <neel@FreeBSD.org> |
Add counter to keep track of the number of timer interrupts generated by the local apic for each virtual cpu.
|
#
485f986a |
|
15-Dec-2012 |
Neel Natu <neel@FreeBSD.org> |
Modify the default behavior of bhyve such that it no longer forces the use of x2apic mode on the guest. The guest can decide whether or not it wants to use legacy mmio or x2apic access to the APIC by writing to the MSR_APICBASE register. Obtained from: NetApp
|
#
2e25737a |
|
20-Oct-2012 |
Neel Natu <neel@FreeBSD.org> |
Calculate the number of host ticks until the next guest timer interrupt. This information will be used in conjunction with guest "HLT exiting" to yield the thread hosting the virtual cpu. Obtained from: NetApp
|
#
73820fb0 |
|
25-Sep-2012 |
Neel Natu <neel@FreeBSD.org> |
Add an option "-a" to present the local apic in the XAPIC mode instead of the default X2APIC mode to the guest.
|
#
a2da7af6 |
|
25-Sep-2012 |
Neel Natu <neel@FreeBSD.org> |
Add support for trapping MMIO writes to local apic registers and emulating them. The default behavior is still to present the local apic to the guest in the x2apic mode.
|
#
edf89256 |
|
24-Sep-2012 |
Neel Natu <neel@FreeBSD.org> |
Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an AP needs to be activated by spinning up an execution context for it. The local apic emulation is now completely done in the hypervisor and it will detect writes to the ICR_LO register that try to bring up the AP. In response to such writes it will return to userspace with an exit code of SPINUP_AP. Reviewed by: grehan
|
#
2d3a73ed |
|
20-Sep-2012 |
Neel Natu <neel@FreeBSD.org> |
Restructure the x2apic access code in preparation for supporting memory mapped access to the local apic. The vlapic code is now aware of the mode that the guest is using to access the local apic. Reviewed by: grehan@
|
#
cd942e0f |
|
28-Apr-2012 |
Peter Grehan <grehan@FreeBSD.org> |
MSI-x interrupt support for PCI pass-thru devices. Includes instruction emulation for memory r/w access. This opens the door for io-apic, local apic, hpet timer, and legacy device emulation. Submitted by: ryan dot berryhill at sandvine dot com Reviewed by: grehan Obtained from: Sandvine
|
#
2a5bbbe3 |
|
06-Mar-2012 |
Ed Maste <emaste@FreeBSD.org> |
Remove duplicated license text.
|
#
14ddf164 |
|
06-Jul-2011 |
Neel Natu <neel@FreeBSD.org> |
Get rid of redundant initialization of 'dmask'. It was being re-initialized shortly afterwards.
|
#
366f6083 |
|
12-May-2011 |
Peter Grehan <grehan@FreeBSD.org> |
Import of bhyve hypervisor and utilities, part 1. vmm.ko - kernel module for VT-x, VT-d and hypervisor control bhyve - user-space sequencer and i/o emulation vmmctl - dump of hypervisor register state libvmm - front-end to vmm.ko chardev interface bhyve was designed and implemented by Neel Natu. Thanks to the following folk from NetApp who helped to make this available: Joe CaraDonna Peter Snyder Jeff Heller Sandeep Mann Steve Miller Brian Pawlowski
|