History log of /freebsd-10.3-release/sys/amd64/vmm/amd/svm.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 296373 04-Mar-2016 marius

- Copy stable/10@296371 to releng/10.3 in preparation for 10.3-RC1
builds.
- Update newvers.sh to reflect RC1.
- Update __FreeBSD_version to reflect 10.3.
- Update default pkg(8) configuration to use the quarterly branch.

Approved by: re (implicit)

# 295124 01-Feb-2016 grehan

MFC r284539, r284630, r284688, r284877, r285217, r285218,
r286837, r286838, r288470, r288522, r288524, r288826,
r289001

Pull in bhyve bug fixes and changes to allow UEFI booting.
This provides Windows support.

Tested on Intel and AMD with:
- Arch Linux i386+amd64 (kernel 4.3.3)
- Ubuntu 15.10 server 64-bit
- FreeBSD-CURRENT/amd64 20160127 snap
- FreeBSD 10.2 i386+amd64
- OpenBSD 5.8 i386+amd64
- SmartOS latest
- Windows 10 build 1511'

Huge thanks to Yamagi Burmeister who submitted the patch
and did the majority of the testing.

r284539 - bootrom mem allocation support
r284630 - Add SO_REUSEADDR when starting debug port
r284688 - Fix a regression in "movs" emulation
r284877 - verify_gla() non-zero segment base fix
r285217 - Always assert DCD and DSR in the uart
r285218 - devmem nodes moved to /dev/vmm.io/
r286837 - Add define for SATA Check-Power-Mode
r286838 - Add simple (no-op) SATA cmd emulations
r288470 - Increase virtio-blk indirect descs
r288522 - Firmware guest query interface
r288524 - Fix post-test typo
r288826 - Clean up SATA unimplemented cmd msg
r289001 - Add -l option to specify userboot path

Submitted by: Yamagi Burmeister
Approved by: re (kib)


# 285015 01-Jul-2015 neel

MFC r284712:
Restore the host's GS.base before returning from 'svm_launch()' so the Dtrace
FBT provider works with vmm.ko on AMD.


# 284900 28-Jun-2015 neel

MFC r282209:
Emulate the 'bit test' instruction.

MFC r282259:
Re-implement RTC current time calculation to eliminate the possibility of
losing time.

MFC r282281:
Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.

MFC r282284:
When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

MFC r282287:
Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

MFC r282296:
Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC r282301:
Relax limits when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

MFC r282335:
Advertise an additional memory BAR in the "dummy" device emulation.

MFC r282336:
Emulate machine check related MSRs to allow guest OSes like Windows to boot.

MFC r282351:
Don't advertise the Intel SMX capability to the guest.

MFC r282407:
Emulate the 'CMP r/m8, imm8' instruction.

MFC r282519:
Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.

MFC r282520:
Emulate guest writes to EFER_MSR properly.

MFC r282558:
Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

MFC r282571:
Check 'td_owepreempt' and yield the vcpu thread if it is set.

MFC r282595:
Allow byte reads of AHCI registers.

MFC r282784:
Handling indirect descriptors is a capability of the host and not one that
needs to be negotiated. Use the host capabilities field and not the negotiated
field when verifying that indirect descriptors are supported.

MFC r282788:
Allow configuration of the sector size advertised to the guest.

MFC r282865:
Set the subvendor field in config space to the vendor ID. This is required
by the Windows virtio drivers to correctly match a device.

MFC r282922:
Bump the size of the blockif scatter-gather list to 67.

MFC r283075:
Fix off-by-one in array index bounds check. bhyveload would allow you to
create 33 entries on an array that only has 32 slots

MFC r283168:
Temporarily revert r282922 which bumped the max descriptors.

MFC r283255:
Emulate the "CMP r/m, reg" instruction (opcode 39H).

MFC r283256:
Add an option "--get-vmcs-exit-inst-length" to display the instruction length
of the instruction that caused the VM-exit.

MFC r283264:
Change the header type of the emulated host-bridge from type 1 to type 0.

MFC r283293:
Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation.

MFC r283299:
Remove bogus verification of instruction length after instruction decode.

MFC r283308:
Exceptions don't deliver an error code in real mode.

MFC r283657:
Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state.

MFC r283973:
Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids'
to limit the number of ASIDs used by the hypervisor.

MFC r284046:
Fix regression in 'verify_gla()' with the RIP-relative addressing mode.

MFC r284174:
Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control.


# 284899 28-Jun-2015 neel

MFC r279444:
Allow passthrough devices to be hinted.

MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.

MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.

MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.

MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.

MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.

MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.

MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.

MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)

MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.

MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.

MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

MFC r281879:
Missing break in switch case (Coverity ID 1292499)

MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.

MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.

MFC r282206:
Implement the century byte in the RTC.


# 284894 27-Jun-2015 neel

MFC r276428:
Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.

MFC r276432:
Initialize all fields of 'struct vm_exception exception' before passing it
to vm_inject_exception().

MFC r276763:
Clear blocking due to STI or MOV SS in the hypervisor when an instruction is
emulated or when the vcpu incurs an exception.

MFC r277149:
Clean up usage of 'struct vm_exception' to only to communicate information
from userspace to vmm.ko when injecting an exception.

MFC r277168:
Fix typo (missing comma).

MFC r277309:
Make the error message explicit instead of just printing the usage if the
virtual machine name is not specified.

MFC r277310:
Simplify instruction restart logic in bhyve.

MFC r277359:
Fix a bug in libvmmapi 'vm_copy_setup()' where it would return success even
if the 'gpa' was in the guest MMIO region.

MFC r277360:
MOVS instruction emulation.

MFC r277626:
Add macro to identify AVIC capability (advanced virtual interrupt controller)
in AMD processors.

MFC r279220:
Don't close a block context if it couldn't be opened avoiding a null deref.

MFC r279225:
Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time.

MFC r279227:
Emulate MSR 0xC0011024 when running on AMD processors.

MFC r279228:
Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restore
capability of VT-x. This lets bhyve run nested in older VMware versions that
don't support the PAT save/restore capability.

MFC r279540:
Fix warnings/errors when building vmm.ko with gcc.


# 276403 30-Dec-2014 neel

MFC r273375
Add support AMD processors with the SVM/AMD-V hardware extensions.

MFC r273749
Remove bhyve SVM feature printf's now that they are available in the general
CPU feature detection code.

MFC r273766
Add missing 'break' pointed out by Coverity CID 1249760.

MFC r276098
Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions"

MFC r276392
Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on an
AMD/SVM host.

MFC r276402
Remove "svn:mergeinfo" property that was dragged along when these files were
svn copied in r273375.


# 273375 21-Oct-2014 neel

Merge projects/bhyve_svm into HEAD.

After this change bhyve supports AMD processors with the SVM/AMD-V hardware
extensions.

More details available here:
https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002905.html

Submitted by: Anish Gupta (akgupt3@gmail.com)
Tested by: Benjamin Perrault (ben.perrault@gmail.com)
Tested by: Willem Jan Withagen (wjw@digiware.nl)


# 273176 16-Oct-2014 neel

Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.


# 272929 11-Oct-2014 neel

Get rid of unused headers.
Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static.
Replace ERR() with KASSERT().
style(9) cleanup.


# 272926 11-Oct-2014 neel

Use a consistent style for messages emitted when the module is loaded.


# 272195 27-Sep-2014 neel

Simplify register state save and restore across a VMRUN:

- Host registers are now stored on the stack instead of a per-cpu host context.

- Host %FS and %GS selectors are not saved and restored across VMRUN.
- Restoring the %FS/%GS selectors was futile anyways since that only updates
the low 32 bits of base address in the hidden descriptor state.
- GS.base is properly updated via the MSR_GSBASE on return from svm_launch().
- FS.base is not used while inside the kernel so it can be safely ignored.

- Add function prologue/epilogue so svm_launch() can be traced with Dtrace's
FBT entry/exit probes. They also serve to save/restore the host %rbp across
VMRUN.

Reviewed by: grehan
Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271939 21-Sep-2014 neel

Allow more VMCB fields to be cached:
- CR2
- CR0, CR3, CR4 and EFER
- GDT/IDT base/limit fields
- CS/DS/ES/SS selector/base/limit/attrib fields

The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'.

Restructure the code such that the fields above are only modified in a single
place. This makes it easy to invalidate the VMCB cache when any of these fields
is modified.


# 271912 20-Sep-2014 neel

IFC r271888.

Restructure MSR emulation so it is all done in processor-specific code.


# 271715 17-Sep-2014 neel

IFC @r271694


# 271694 17-Sep-2014 neel

Rework vNMI injection.

Keep track of NMI blocking by enabling the IRET intercept on a successful
vNMI injection. The NMI blocking condition is cleared when the handler
executes an IRET and traps back into the hypervisor.

Don't inject NMI if the processor is in an interrupt shadow to preserve the
atomic nature of "STI;HLT". Take advantage of this and artificially set the
interrupt shadow to prevent NMI injection when restarting the "iret".

Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan


# 271662 16-Sep-2014 neel

Minor cleanup.

Get rid of unused 'svm_feature' from the softc.

Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check
in vmm.c against 'vm->active_cpus' before the AMD-specific code is called.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 271661 16-Sep-2014 neel

Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the
processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector
and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes
care of injecting this vector when the guest is able to receive it.

Legacy PIC interrupts are still delivered via the event injection mechanism.
This is because the vector injected by the PIC must reflect the state of its
pins at the time the CPU is ready to accept the interrupt.

Accesses to the TPR via %CR8 are handled entirely in hardware. This requires
that the emulated TPR must be synced to V_TPR after a #VMEXIT.

The guest can also modify the TPR via the memory mapped APIC. This requires
that the V_TPR must be synced with the emulated TPR before a VMRUN.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 271570 14-Sep-2014 neel

Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 271559 13-Sep-2014 neel

Bug fixes.

- Don't enable the HLT intercept by default. It will be enabled by bhyve(8)
if required. Prior to this change HLT exiting was always enabled making
the "-H" option to bhyve(8) meaningless.

- Recognize a VM exit triggered by a non-maskable interrupt. Prior to this
change the exit would be punted to userspace and the virtual machine would
terminate.


# 271557 13-Sep-2014 neel

style(9): insert an empty line if the function has no local variables

Pointed out by: grehan


# 271554 13-Sep-2014 neel

AMD processors that have the SVM decode assist capability will store the
instruction bytes in the VMCB on a nested page fault. This is useful because
it saves having to walk the guest page tables to fetch the instruction.

vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len'
that map directly to 'vie->inst[]' and 'vie->num_valid'.

The instruction emulation handler skips calling 'vmm_fetch_instruction()'
if 'vie->num_valid' is non-zero.

The use of this capability can be turned off by setting the sysctl/tunable
'hw.vmm.svm.disable_npf_assist' to '1'.

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Discussed with: grehan


# 271419 11-Sep-2014 neel

style(9): indent the switch, don't indent the case, indent case body one tab.


# 271415 11-Sep-2014 neel

Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt
window exiting. This simply involves setting V_IRQ and enabling the VINTR
intercept. This instructs the CPU to trap back into the hypervisor as soon
as an interrupt can be injected into the guest. The pending interrupt is
then injected via the traditional event injection mechanism.

Rework vcpu interrupt injection so that Linux guests now idle with host cpu
utilization close to 0%.

Reviewed by: Anish Gupta (earlier version)
Discussed with: grehan


# 271348 10-Sep-2014 neel

Allow intercepts and irq fields to be cached by the VMCB.

Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete
VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated
when intercepts are modified.

Each intercept is identified as a (index,bitmask) tuple. For e.g., the
VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR).
The first 20 bytes in control area that are used to enable intercepts
are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'.

Modify svm_setcap() and svm_getcap() to use the new APIs.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271346 10-Sep-2014 neel

Move the VMCB initialization into svm.c in preparation for changes to the
interrupt injection logic.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271345 10-Sep-2014 neel

Move the event injection function into svm.c and add KTR logging for
every event injection.

This in in preparation for changes to SVM guest interrupt injection.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271344 10-Sep-2014 neel

Remove a bogus check that flagged an error if the guest %rip was zero.

An AP begins execution with %rip set to 0 after a startup IPI.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271343 10-Sep-2014 neel

Make the KTR tracepoints uniform and ensure that every VM-exit is logged.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271342 10-Sep-2014 neel

Allow guest read access to MSR_EFER without hypervisor intervention.

Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.


# 271340 09-Sep-2014 neel

Remove gratuitous forward declarations.
Remove tabs on empty lines.


# 271203 06-Sep-2014 neel

Do proper ASID management for guest vcpus.

Prior to this change an ASID was hard allocated to a guest and shared by all
its vcpus. The meant that the number of VMs that could be created was limited
to the number of ASIDs supported by the CPU. It was also inefficient because
it forced a TLB flush on every VMRUN.

With this change the number of guests that can be created is independent of
the number of available ASIDs. Also, the TLB is flushed only when a new ASID
is allocated.

Discussed with: grehan
Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 271152 05-Sep-2014 neel

Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is called
just once when a vcpu is initialized.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271086 04-Sep-2014 neel

Consolidate the code to restore the host TSS after a #VMEXIT into a single
function restore_host_tss().

Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in
the kernel. It will be restored on return to userspace.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 270962 02-Sep-2014 neel

IFC @r269962

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 270511 25-Aug-2014 neel

An exception is allowed to be injected even if the vcpu is in an interrupt
shadow, so move the check for pending exception before bailing out due to
an interrupt shadow.

Change return type of 'vmcb_eventinject()' to a void and convert all error
returns into KASSERTs.

Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift
before masking the result.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 267367 11-Jun-2014 neel

Disable global interrupts early so all the software state maintained by bhyve
is sampled "atomically". Any interrupts after this point will be held pending
by the CPU until the guest starts executing and will immediately trigger a
#VMEXIT.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 267305 09-Jun-2014 grehan

Temporary fix for guest idle detection.

Handle ExtINT injection for SVM. The HPET emulation
will inject a legacy interrupt at startup, and if this
isn't handled, will result in the HLT-exit code assuming
there are outstanding ExtINTs and return without sleeping.

svm_inj_interrupts() needs more changes to bring it up
to date with the VT-x version: these are forthcoming.

Reviewed by: neel


# 267218 07-Jun-2014 grehan

Allow the TSC MSR to be accessed directly from the guest.


# 267144 06-Jun-2014 grehan

ins/outs support for SVM. Modelled on the Intel VT-x code.

Remove CR2 save/restore - the guest restore/save is done
in hardware, and there is no need to save/restore the host
version (same as VT-x).

Submitted by: neel (SVM segment descriptor 'P' bit code)
Reviewed by: neel


# 267032 03-Jun-2014 grehan

Use API call when VM is detected as suspended. This fixes
the (harmless) error message on exit:

vmexit_suspend: invalid reason 217645057

Reviewed by: neel, Anish Gupta (akgupt3@gmail.com)


# 267003 03-Jun-2014 grehan

Bring (almost) up-to-date with HEAD.

- use the new virtual APIC page
- update to current bhyve APIs

Tested by Anish with multiple FreeBSD SMP VMs on a Phenom,
and verified by myself with light FreeBSD VM testing
on a Sempron 3850 APU.

The issues reported with Linux guests are very likely to still
be here, but this sync eliminates the skew between the
project branch and CURRENT, and should help to determine
the causes.

Some follow-on commits will fix minor cosmetic issues.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 261462 04-Feb-2014 grehan

Changes to the SVM code to bring it up to r259205

- Convert VMM_CTR to VCPU_CTR KTR macros
- Special handling of halt, save rflags for VMM layer to emulate
halt for vcpu(sleep to be awakened by interrupt or stop it)
- Cleanup of RVI exit handling code

Submitted by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 259579 18-Dec-2013 grehan

Enable memory overcommit for AMD processors.

- No emulation of A/D bits is required since AMD-V RVI
supports A/D bits.
- Enable pmap PT_RVI support(w/o PAT) which is required for
memory over-commit support.
- Other minor fixes:
* Make use of VMCB EXITINTINFO field. If a #VMEXIT happens while
delivering an interrupt, EXITINTINFO has all the details that bhyve
needs to inject the same interrupt.
* SVM h/w decode assist code was incomplete - removed for now.
* Some minor code clean-up (more coming).

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 256867 21-Oct-2013 neel

The ASID allocation in SVM is incorrect because it allocates a single ASID for
all vcpus belonging to a guest. This means that when different vcpus belonging
to the same guest are executing on the same host cpu there may be "leakage"
in the mappings created by one vcpu to another.

The proper fix for this is being worked on and will be committed shortly.

In the meantime workaround this bug by flushing the guest TLB entries on every
VM entry.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 256588 16-Oct-2013 grehan

Fix SVM handling of ASTPENDING, which manifested as a
hang on console output (due to a missing interrupt).

SVM does exit processing and then handles ASTPENDING which
overwrites the already handled SVM exit cause and corrupts
virtual machine state. For example, if the SVM exit was due to
an I/O port access but the main loop detected an ASTPENDING,
the exit would be processed as ASTPENDING and leave the
device (e.g. emulated UART) for that I/O port in bad state.

Submitted by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 254677 23-Aug-2013 grehan

Add in last remaining files to get AMD-SVM operational.

Submitted by: Anish Gupta (akgupt3@gmail.com)