History log of /freebsd-11-stable/sys/amd64/vmm/amd/svm.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 365777 15-Sep-2020 emaste

MFC r365775: bhyve: do not permit write access to VMCB / VMCS

Reported by: Patrick Mooney
Submitted by: jhb
Security: CVE-2020-24718


# 365769 15-Sep-2020 kib

MFC r365766:
bhyve: intercept AMD SVM instructions.

CVE: CVE-2020-7467


# 348271 25-May-2019 rgrimes

MFC: r346714: Add accessor function for vm->maxcpus

Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.

This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298

Approved by: re (kib)


# 340545 18-Nov-2018 jhb

MFC 339312,339364: Restore more descriptors during VM exits.

339312:
Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits.

The VT-x VMCS only stores the base address of the GDTR and IDTR. As a
result, VM exits use a fixed limit of 0xffff for the host GDTR and
IDTR losing the smaller limits set in when the initial GDT is loaded
on each CPU during boot. Explicitly save and restore the full GDTR
and IDTR contents around VM entries and exits to restore the correct
limit.

Similarly, explicitly save and restore the LDT selector. VM exits
always clear the host LDTR as if the LDT was loaded with a NULL
selector and a userspace hypervisor is probably using a NULL selector
anyway, but save and restore the LDT explicitly just to be safe.

339364:
Reload the LDT selector after an AMD-v #VMEXIT.

cpu_switch() always reloads the LDT, so this can only affect the
hypervisor process itself. Fix this by explicitly reloading the host
LDT selector after each #VMEXIT. The stock bhyve process on FreeBSD
never uses a custom LDT, so this change is cosmetic.

PR: 230773


# 338691 14-Sep-2018 jhb

MFC 332454,334009,334122: Various fixes for x86 debug exceptions.

332454:
Fix PSL_T inheritance on exec for x86.

The miscellaneous x86 sysent->sv_setregs() implementations tried to
migrate PSL_T from the previous program to the new executed one, but
they evaluated regs->tf_eflags after the whole regs structure was
bzeroed. Make this functional by saving PSL_T value before zeroing.

Note that if the debugger is not attached, executing the first
instruction in the new program with PSL_T set results in SIGTRAP, and
since all intercepted signals are reset to default dispostion on
exec(2), this means that non-debugged process gets killed immediately
if PSL_T is inherited. In particular, since suid images drop
P_TRACED, attempt to set PSL_T for execution of such program would
kill the process.

Another issue with userspace PSL_T handling is that it is reset by
trap(). It is reasonable to clear PSL_T when entering SIGTRAP
handler, to allow the signal to be handled without recursion or
delivery of blocked fault. But it is not reasonable to return back to
the normal flow with PSL_T cleared. This is too late to change, I
think.

334009:
Cleanups related to debug exceptions on x86.

- Add constants for fields in DR6 and the reserved fields in DR7. Use
these constants instead of magic numbers in most places that use DR6
and DR7.
- Refer to T_TRCTRAP as "debug exception" rather than a "trace trap"
as it is not just for trace exceptions.
- Always read DR6 for debug exceptions and only clear TF in the flags
register for user exceptions where DR6.BS is set.
- Clear DR6 before returning from a debug exception handler as
recommended by the SDM dating all the way back to the 386. This
allows debuggers to determine the cause of each exception. For
kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value
to other parts of the handler (namely, user_dbreg_trap()). For user
traps, wait until after trapsignal to clear DR6 so that userland
debuggers can read DR6 via PT_GETDBREGS while the thread is stopped
in trapsignal().

334122:
x86: stop unconditionally clearing PSL_T on the trace trap.

We certainly should clear PSL_T when calling the SIGTRAP signal
handler, which is already done by all x86 sendsig(9) ABI code. On the
other hand, there is no obvious reason why PSL_T needs to be cleared
when returning from the signal handler. For instance, Linux allows
userspace to set PSL_T and keep tracing enabled for the desired
period. There are userspace programs which would use PSL_T if we make
it possible, for instance sbcl.

Remember if PSL_T was set by PT_STEP or PT_SETSTEP by mean of TDB_STEP
flag, and only clear it when the flag is set.


# 336190 11-Jul-2018 araujo

MFC r335030:

Add SPDX tags to vmm(4).

Sponsored by: iXsystems Inc.


# 330623 07-Mar-2018 jhb

MFC 328102: Save and restore guest debug registers.

Currently most of the debug registers are not saved and restored
during VM transitions allowing guest and host debug register values to
leak into the opposite context. One result is that hardware
watchpoints do not work reliably within a guest under VT-x.

Due to differences in SVM and VT-x, slightly different approaches are
used.

For VT-x:

- Enable debug register save/restore for VM entry/exit in the VMCS for
DR7 and MSR_DEBUGCTL.
- Explicitly save DR0-3,6 of the guest.
- Explicitly save DR0-3,6-7, MSR_DEBUGCTL, and the trap flag from
%rflags for the host. Note that because DR6 is "software" managed
and not stored in the VMCS a kernel debugger which single steps
through VM entry could corrupt the guest DR6 (since a single step
trap taken after loading the guest DR6 could alter the DR6
register). To avoid this, explicitly disable single-stepping via
the trace flag before loading the guest DR6. A determined debugger
could still defeat this by setting a breakpoint after the guest DR6
was loaded and then single-stepping.

For SVM:
- Enable debug register caching in the VMCB for DR6/DR7.
- Explicitly save DR0-3 of the guest.
- Explicitly save DR0-3,6-7, and MSR_DEBUGCTL for the host. Since SVM
saves the guest DR6 in the VMCB, the race with single-stepping
described for VT-x does not exist.

For both platforms, expose all of the guest DRx values via --get-drX
and --set-drX flags to bhyvectl.


# 330068 27-Feb-2018 avg

MFC r329364: move vintr_intercept_enabled under INVARIANTS


# 329320 15-Feb-2018 avg

MFC r328622: vmm/svm: post LAPIC interrupts using event injection

PR: 215972


# 328840 04-Feb-2018 avg

MFC r327726: vmm/svm: contigmalloc of the whole svm_softc is excessive


# 308435 08-Nov-2016 avg

MFC r307903,307904,308039,308050: vmm/svm: iopm_bitmap and msr_bitmap
must be contiguous in physical memory


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 295880 22-Feb-2016 skra

As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D5373


# 284712 23-Jun-2015 neel

Restore the host's GS.base before returning from 'svm_launch()'.

Previously this was done by the caller of 'svm_launch()' after it returned.
This works fine as long as no code is executed in the interim that depends
on pcpu data.

The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because
it calls 'dtrace_probe()' which in turn relies on pcpu data.

Reported by: avg
MFC after: 1 week


# 284539 18-Jun-2015 neel

Restructure memory allocation in bhyve to support "devmem".

devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks


# 283973 04-Jun-2015 neel

Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware.

Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by
the hypervisor.

MFC after: 1 week


# 283657 28-May-2015 neel

Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.

MFC after: 2 weeks


# 282520 06-May-2015 neel

Do a proper emulation of guest writes to MSR_EFER.
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
segment limits in 64-bit mode.

MFC after: 2 weeks


# 281879 23-Apr-2015 araujo

Missing break in switch case.

Differential Revision: D2342
Reviewed by: neel


# 281612 16-Apr-2015 neel

Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC after: 1 week


# 280447 24-Mar-2015 tychon

When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

Also if an instruction doesn't support a mod R/M (modRM) byte, don't
be concerned if the CPU is in real mode.

Reviewed by: neel


# 279540 02-Mar-2015 neel

Fix warnings/errors when building vmm.ko with gcc:

- fix warning about comparison of 'uint8_t v_tpr >= 0' always being true.

- fix error triggered by an empty clobber list in the inline assembly for
"clgi" and "stgi"

- fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The
gcc assembler does not like the explicit operand "%rax" while the clang
assembler requires specifying the operand "%rax". Fix this by encoding the
instructions using the ".byte" directive.

Reported by: julian
MFC after: 1 week


# 277626 23-Jan-2015 neel

Add macro to identify AVIC capability (advanced virtual interrupt controller)
in AMD processors.

Submitted by: Dmitry Luhtionov (dmitryluhtionov@gmail.com)


# 277149 13-Jan-2015 neel

'struct vm_exception' was intended to be used only as the collateral for the
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.

Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.

Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.

MFC after: 1 week


# 276763 06-Jan-2015 neel

Clear blocking due to STI or MOV SS in the hypervisor when an instruction is
emulated or when the vcpu incurs an exception. This matches the CPU behavior.

Remove special case code in HLT processing that was clearing the interrupt
shadow. This is now redundant because the interrupt shadow is always cleared
when the vcpu is resumed after an instruction is emulated.

Reported by: David Reed (david.reed@tidalscale.com)
MFC after: 2 weeks


# 276432 30-Dec-2014 neel

Initialize all fields of 'struct vm_exception exception' before passing it to
vm_inject_exception(). This fixes the issue that 'exception.cpuid' is
uninitialized when calling 'vm_inject_exception()'.

However, in practice this change is a no-op because vm_inject_exception()
does not use 'exception.cpuid' for anything.

Reported by: Coverity Scan
CID: 1261297
MFC after: 3 days


# 276402 30-Dec-2014 neel

Remove "svn:mergeinfo" property that was dragged along when these files were
svn copied in r273375.

Suggested by: ngie, gjb


# 276392 30-Dec-2014 neel

Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on
an AMD/SVM host.

MFC after: 1 week


# 276098 23-Dec-2014 neel

Allow ktr(4) tracing of all guest exceptions via the tunable
"hw.vmm.trace_guest_exceptions". To enable this feature set the tunable
to "1" before loading vmm.ko.

Tracing the guest exceptions can be useful when debugging guest triple faults.

Note that there is a performance impact when exception tracing is enabled
since every exception will now trigger a VM-exit.

Also, handle machine check exceptions that happen during guest execution
by vectoring to the host's machine check handler via "int $18".

Discussed with: grehan
MFC after: 2 weeks


# 273749 27-Oct-2014 grehan

Remove bhyve SVM feature printf's now that they are available in the
general CPU feature detection code.

Reviewed by: neel


# 273375 21-Oct-2014 neel

Merge projects/bhyve_svm into HEAD.

After this change bhyve supports AMD processors with the SVM/AMD-V hardware
extensions.

More details available here:
https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002905.html

Submitted by: Anish Gupta (akgupt3@gmail.com)
Tested by: Benjamin Perrault (ben.perrault@gmail.com)
Tested by: Willem Jan Withagen (wjw@digiware.nl)


# 273176 16-Oct-2014 neel

Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.


# 272929 11-Oct-2014 neel

Get rid of unused headers.
Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static.
Replace ERR() with KASSERT().
style(9) cleanup.


# 272926 11-Oct-2014 neel

Use a consistent style for messages emitted when the module is loaded.


# 272195 27-Sep-2014 neel

Simplify register state save and restore across a VMRUN:

- Host registers are now stored on the stack instead of a per-cpu host context.

- Host %FS and %GS selectors are not saved and restored across VMRUN.
- Restoring the %FS/%GS selectors was futile anyways since that only updates
the low 32 bits of base address in the hidden descriptor state.
- GS.base is properly updated via the MSR_GSBASE on return from svm_launch().
- FS.base is not used while inside the kernel so it can be safely ignored.

- Add function prologue/epilogue so svm_launch() can be traced with Dtrace's
FBT entry/exit probes. They also serve to save/restore the host %rbp across
VMRUN.

Reviewed by: grehan
Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271939 21-Sep-2014 neel

Allow more VMCB fields to be cached:
- CR2
- CR0, CR3, CR4 and EFER
- GDT/IDT base/limit fields
- CS/DS/ES/SS selector/base/limit/attrib fields

The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'.

Restructure the code such that the fields above are only modified in a single
place. This makes it easy to invalidate the VMCB cache when any of these fields
is modified.


# 271912 20-Sep-2014 neel

IFC r271888.

Restructure MSR emulation so it is all done in processor-specific code.


# 271715 17-Sep-2014 neel

IFC @r271694


# 271694 16-Sep-2014 neel

Rework vNMI injection.

Keep track of NMI blocking by enabling the IRET intercept on a successful
vNMI injection. The NMI blocking condition is cleared when the handler
executes an IRET and traps back into the hypervisor.

Don't inject NMI if the processor is in an interrupt shadow to preserve the
atomic nature of "STI;HLT". Take advantage of this and artificially set the
interrupt shadow to prevent NMI injection when restarting the "iret".

Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan


# 271662 16-Sep-2014 neel

Minor cleanup.

Get rid of unused 'svm_feature' from the softc.

Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check
in vmm.c against 'vm->active_cpus' before the AMD-specific code is called.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 271661 16-Sep-2014 neel

Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the
processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector
and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes
care of injecting this vector when the guest is able to receive it.

Legacy PIC interrupts are still delivered via the event injection mechanism.
This is because the vector injected by the PIC must reflect the state of its
pins at the time the CPU is ready to accept the interrupt.

Accesses to the TPR via %CR8 are handled entirely in hardware. This requires
that the emulated TPR must be synced to V_TPR after a #VMEXIT.

The guest can also modify the TPR via the memory mapped APIC. This requires
that the V_TPR must be synced with the emulated TPR before a VMRUN.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 271570 14-Sep-2014 neel

Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 271559 13-Sep-2014 neel

Bug fixes.

- Don't enable the HLT intercept by default. It will be enabled by bhyve(8)
if required. Prior to this change HLT exiting was always enabled making
the "-H" option to bhyve(8) meaningless.

- Recognize a VM exit triggered by a non-maskable interrupt. Prior to this
change the exit would be punted to userspace and the virtual machine would
terminate.


# 271557 13-Sep-2014 neel

style(9): insert an empty line if the function has no local variables

Pointed out by: grehan


# 271554 13-Sep-2014 neel

AMD processors that have the SVM decode assist capability will store the
instruction bytes in the VMCB on a nested page fault. This is useful because
it saves having to walk the guest page tables to fetch the instruction.

vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len'
that map directly to 'vie->inst[]' and 'vie->num_valid'.

The instruction emulation handler skips calling 'vmm_fetch_instruction()'
if 'vie->num_valid' is non-zero.

The use of this capability can be turned off by setting the sysctl/tunable
'hw.vmm.svm.disable_npf_assist' to '1'.

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Discussed with: grehan


# 271419 11-Sep-2014 neel

style(9): indent the switch, don't indent the case, indent case body one tab.


# 271415 11-Sep-2014 neel

Repurpose the V_IRQ interrupt injection to implement VMX-style interrupt
window exiting. This simply involves setting V_IRQ and enabling the VINTR
intercept. This instructs the CPU to trap back into the hypervisor as soon
as an interrupt can be injected into the guest. The pending interrupt is
then injected via the traditional event injection mechanism.

Rework vcpu interrupt injection so that Linux guests now idle with host cpu
utilization close to 0%.

Reviewed by: Anish Gupta (earlier version)
Discussed with: grehan


# 271348 10-Sep-2014 neel

Allow intercepts and irq fields to be cached by the VMCB.

Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete
VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated
when intercepts are modified.

Each intercept is identified as a (index,bitmask) tuple. For e.g., the
VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR).
The first 20 bytes in control area that are used to enable intercepts
are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'.

Modify svm_setcap() and svm_getcap() to use the new APIs.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271346 10-Sep-2014 neel

Move the VMCB initialization into svm.c in preparation for changes to the
interrupt injection logic.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271345 10-Sep-2014 neel

Move the event injection function into svm.c and add KTR logging for
every event injection.

This in in preparation for changes to SVM guest interrupt injection.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271344 09-Sep-2014 neel

Remove a bogus check that flagged an error if the guest %rip was zero.

An AP begins execution with %rip set to 0 after a startup IPI.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271343 09-Sep-2014 neel

Make the KTR tracepoints uniform and ensure that every VM-exit is logged.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271342 09-Sep-2014 neel

Allow guest read access to MSR_EFER without hypervisor intervention.

Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.


# 271340 09-Sep-2014 neel

Remove gratuitous forward declarations.
Remove tabs on empty lines.


# 271203 06-Sep-2014 neel

Do proper ASID management for guest vcpus.

Prior to this change an ASID was hard allocated to a guest and shared by all
its vcpus. The meant that the number of VMs that could be created was limited
to the number of ASIDs supported by the CPU. It was also inefficient because
it forced a TLB flush on every VMRUN.

With this change the number of guests that can be created is independent of
the number of available ASIDs. Also, the TLB is flushed only when a new ASID
is allocated.

Discussed with: grehan
Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 271152 05-Sep-2014 neel

Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is called
just once when a vcpu is initialized.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 271086 04-Sep-2014 neel

Consolidate the code to restore the host TSS after a #VMEXIT into a single
function restore_host_tss().

Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in
the kernel. It will be restored on return to userspace.

Discussed with: Anish Gupta (akgupt3@gmail.com)


# 270962 02-Sep-2014 neel

IFC @r269962

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 270511 24-Aug-2014 neel

An exception is allowed to be injected even if the vcpu is in an interrupt
shadow, so move the check for pending exception before bailing out due to
an interrupt shadow.

Change return type of 'vmcb_eventinject()' to a void and convert all error
returns into KASSERTs.

Fix VMCB_EXITINTINFO_EC(x) and VMCB_EXITINTINFO_TYPE(x) to do the shift
before masking the result.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 267367 11-Jun-2014 neel

Disable global interrupts early so all the software state maintained by bhyve
is sampled "atomically". Any interrupts after this point will be held pending
by the CPU until the guest starts executing and will immediately trigger a
#VMEXIT.

Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 267305 09-Jun-2014 grehan

Temporary fix for guest idle detection.

Handle ExtINT injection for SVM. The HPET emulation
will inject a legacy interrupt at startup, and if this
isn't handled, will result in the HLT-exit code assuming
there are outstanding ExtINTs and return without sleeping.

svm_inj_interrupts() needs more changes to bring it up
to date with the VT-x version: these are forthcoming.

Reviewed by: neel


# 267218 07-Jun-2014 grehan

Allow the TSC MSR to be accessed directly from the guest.


# 267144 06-Jun-2014 grehan

ins/outs support for SVM. Modelled on the Intel VT-x code.

Remove CR2 save/restore - the guest restore/save is done
in hardware, and there is no need to save/restore the host
version (same as VT-x).

Submitted by: neel (SVM segment descriptor 'P' bit code)
Reviewed by: neel


# 267032 03-Jun-2014 grehan

Use API call when VM is detected as suspended. This fixes
the (harmless) error message on exit:

vmexit_suspend: invalid reason 217645057

Reviewed by: neel, Anish Gupta (akgupt3@gmail.com)


# 267003 03-Jun-2014 grehan

Bring (almost) up-to-date with HEAD.

- use the new virtual APIC page
- update to current bhyve APIs

Tested by Anish with multiple FreeBSD SMP VMs on a Phenom,
and verified by myself with light FreeBSD VM testing
on a Sempron 3850 APU.

The issues reported with Linux guests are very likely to still
be here, but this sync eliminates the skew between the
project branch and CURRENT, and should help to determine
the causes.

Some follow-on commits will fix minor cosmetic issues.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 261462 04-Feb-2014 grehan

Changes to the SVM code to bring it up to r259205

- Convert VMM_CTR to VCPU_CTR KTR macros
- Special handling of halt, save rflags for VMM layer to emulate
halt for vcpu(sleep to be awakened by interrupt or stop it)
- Cleanup of RVI exit handling code

Submitted by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 259579 18-Dec-2013 grehan

Enable memory overcommit for AMD processors.

- No emulation of A/D bits is required since AMD-V RVI
supports A/D bits.
- Enable pmap PT_RVI support(w/o PAT) which is required for
memory over-commit support.
- Other minor fixes:
* Make use of VMCB EXITINTINFO field. If a #VMEXIT happens while
delivering an interrupt, EXITINTINFO has all the details that bhyve
needs to inject the same interrupt.
* SVM h/w decode assist code was incomplete - removed for now.
* Some minor code clean-up (more coming).

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 256867 21-Oct-2013 neel

The ASID allocation in SVM is incorrect because it allocates a single ASID for
all vcpus belonging to a guest. This means that when different vcpus belonging
to the same guest are executing on the same host cpu there may be "leakage"
in the mappings created by one vcpu to another.

The proper fix for this is being worked on and will be committed shortly.

In the meantime workaround this bug by flushing the guest TLB entries on every
VM entry.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 256588 16-Oct-2013 grehan

Fix SVM handling of ASTPENDING, which manifested as a
hang on console output (due to a missing interrupt).

SVM does exit processing and then handles ASTPENDING which
overwrites the already handled SVM exit cause and corrupts
virtual machine state. For example, if the SVM exit was due to
an I/O port access but the main loop detected an ASTPENDING,
the exit would be processed as ASTPENDING and leave the
device (e.g. emulated UART) for that I/O port in bad state.

Submitted by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# 254677 22-Aug-2013 grehan

Add in last remaining files to get AMD-SVM operational.

Submitted by: Anish Gupta (akgupt3@gmail.com)