History log of /linux-master/arch/x86/kvm/svm/sev.c
Revision Date Author Comments
# 4710e4fc 09-Mar-2024 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

KVM: SVM: Remove a useless zeroing of allocated memory

Remove KVM's unnecessary zeroing of memory when allocating the pages array
in sev_pin_memory() via __vmalloc(), as the array is only used to hold
kernel pointers. The kmalloc() path for "small" regions doesn't zero the
array, and if KVM leaks state and/or accesses uninitialized data, then the
kernel has bigger problems.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/c7619a3d3cbb36463531a7c73ccbde9db587986c.1710004509.git.christophe.jaillet@wanadoo.fr
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 0ecaefb3 27-Mar-2024 Borislav Petkov (AMD) <bp@alien8.de>

x86/CPU/AMD: Track SNP host status with cc_platform_*()

The host SNP worthiness can determined later, after alternatives have
been patched, in snp_rmptable_init() depending on cmdline options like
iommu=pt which is incompatible with SNP, for example.

Which means that one cannot use X86_FEATURE_SEV_SNP and will need to
have a special flag for that control.

Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places.

Move kdump_sev_callback() to its rightful place, while at it.

Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Link: https://lore.kernel.org/r/20240327154317.29909-6-bp@alien8.de


# fdd58834 31-Jan-2024 Sean Christopherson <seanjc@google.com>

KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES

Return -EINVAL instead of -EBUSY if userspace attempts KVM_SEV{,ES}_INIT
on a VM that already has SEV active. Returning -EBUSY is nonsencial as
it's impossible to deactivate SEV without destroying the VM, i.e. the VM
isn't "busy" in any sane sense of the word, and the odds of any userspace
wanting exactly -EBUSY on a userspace bug are minuscule.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240131235609.4161407-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 0aa6b90e 31-Jan-2024 Ashish Kalra <ashish.kalra@amd.com>

KVM: SVM: Add support for allowing zero SEV ASIDs

Some BIOSes allow the end user to set the minimum SEV ASID value
(CPUID 0x8000001F_EDX) to be greater than the maximum number of
encrypted guests, or maximum SEV ASID value (CPUID 0x8000001F_ECX)
in order to dedicate all the SEV ASIDs to SEV-ES or SEV-SNP.

The SEV support, as coded, does not handle the case where the minimum
SEV ASID value can be greater than the maximum SEV ASID value.
As a result, the following confusing message is issued:

[ 30.715724] kvm_amd: SEV enabled (ASIDs 1007 - 1006)

Fix the support to properly handle this case.

Fixes: 916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240104190520.62510-1-Ashish.Kalra@amd.com
Link: https://lore.kernel.org/r/20240131235609.4161407-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 466eec4a 31-Jan-2024 Sean Christopherson <seanjc@google.com>

KVM: SVM: Use unsigned integers when dealing with ASIDs

Convert all local ASID variables and parameters throughout the SEV code
from signed integers to unsigned integers. As ASIDs are fundamentally
unsigned values, and the global min/max variables are appropriately
unsigned integers, too.

Functionally, this is a glorified nop as KVM guarantees min_sev_asid is
non-zero, and no CPU supports -1u as the _only_ asid, i.e. the signed vs.
unsigned goof won't cause problems in practice.

Opportunistically use sev_get_asid() in sev_flush_encrypted_page() instead
of open coding an equivalent.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240131235609.4161407-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# cc4ce37b 31-Jan-2024 Sean Christopherson <seanjc@google.com>

KVM: SVM: Set sev->asid in sev_asid_new() instead of overloading the return

Explicitly set sev->asid in sev_asid_new() when a new ASID is successfully
allocated, and return '0' to indicate success instead of overloading the
return value to multiplex the ASID with error codes. There is exactly one
caller of sev_asid_new(), and sev_asid_free() already consumes sev->asid,
i.e. returning the ASID isn't necessary for flexibility, nor does it
provide symmetry between related APIs.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240131235609.4161407-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 75253db4 25-Jan-2024 Brijesh Singh <brijesh.singh@amd.com>

KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.

To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.

[ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ]

Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com


# 1ca5614b 25-Jan-2024 Brijesh Singh <brijesh.singh@amd.com>

crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized via the SNP_INIT command.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to pages in other page
states after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.

For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available.

SNP_INIT(_EX) does not use the file associated with init_ex_path. So, to
avoid running into issues where SNP_INIT(_EX) is called while there are
other running guests, issue it during module probe regardless of the
psp_init_on_probe setting, but maintain the previous deferrable handling
for SEV/SEV-ES initialization.

[ mdr: Squash in psp_init_on_probe changes from Tom, reduce
proliferation of 'probe' function parameter where possible.
bp: Fix 32-bit allmodconfig build. ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-14-michael.roth@amd.com


# 5abf6dce 09-Mar-2024 Paolo Bonzini <pbonzini@redhat.com>

SEV: disable SEV-ES DebugSwap by default

The DebugSwap feature of SEV-ES provides a way for confidential guests to use
data breakpoints. However, because the status of the DebugSwap feature is
recorded in the VMSA, enabling it by default invalidates the attestation
signatures. In 6.10 we will introduce a new API to create SEV VMs that
will allow enabling DebugSwap based on what the user tells KVM to do.
Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
enable DebugSwap.

For compatibility with kernels that pre-date the introduction of DebugSwap,
as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
the feature by default. If anybody wants to use it, for now they can enable
the sev_es_debug_swap_enabled module parameter, but this will result in a
warning.

Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 5ef1d8c1 16-Feb-2024 Sean Christopherson <seanjc@google.com>

KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()

Do the cache flush of converted pages in svm_register_enc_region() before
dropping kvm->lock to fix use-after-free issues where region and/or its
array of pages could be freed by a different task, e.g. if userspace has
__unregister_enc_region_locked() already queued up for the region.

Note, the "obvious" alternative of using local variables doesn't fully
resolve the bug, as region->pages is also dynamically allocated. I.e. the
region structure itself would be fine, but region->pages could be freed.

Flushing multiple pages under kvm->lock is unfortunate, but the entire
flow is a rare slow path, and the manual flush is only needed on CPUs that
lack coherency for encrypted memory.

Fixes: 19a23da53932 ("Fix unsynchronized access to sev members through svm_register_enc_region")
Reported-by: Gabe Kirkpatrick <gkirkpatrick@google.com>
Cc: Josh Eads <josheads@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20240217013430.2079561-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 770d6aa2 18-Oct-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Explicitly require FLUSHBYASID to enable SEV support

Add a sanity check that FLUSHBYASID is available if SEV is supported in
hardware, as SEV (and beyond) guests are bound to a single ASID, i.e. KVM
can't "flush" by assigning a new, fresh ASID to the guest. If FLUSHBYASID
isn't supported for some bizarre reason, KVM would completely fail to do
TLB flushes for SEV+ guests (see pre_svm_run() and pre_sev_run()).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20231018193617.1895752-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# a26b7cd2 16-Oct-2023 Michael Roth <michael.roth@amd.com>

KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests

When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.

However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.

Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.

Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231016132819.1002933-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e0096d01 15-Sep-2023 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Fix TSC_AUX virtualization setup

The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.

This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.

Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.

With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.

Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e8d93d5d 22-Sep-2023 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.

Extracted from a patch by Tom Lendacky.

Cc: stable@vger.kernel.org
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 80d0f521 24-Aug-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Require nrips support for SEV guests (and beyond)

Disallow SEV (and beyond) if nrips is disabled via module param, as KVM
can't read guest memory to partially emulate and skip an instruction. All
CPUs that support SEV support NRIPS, i.e. this is purely stopping the user
from shooting themselves in the foot.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230825013621.2845700-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 1952e74d 24-Aug-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Skip VMSA init in sev_es_init_vmcb() if pointer is NULL

Skip initializing the VMSA physical address in the VMCB if the VMSA is
NULL, which occurs during intrahost migration as KVM initializes the VMCB
before copying over state from the source to the destination (including
the VMSA and its physical address).

In normal builds, __pa() is just math, so the bug isn't fatal, but with
CONFIG_DEBUG_VIRTUAL=y, the validity of the virtual address is verified
and passing in NULL will make the kernel unhappy.

Fixes: 6defa24d3b12 ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# f1187ef2 24-Aug-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Get source vCPUs from source VM for SEV-ES intrahost migration

Fix a goof where KVM tries to grab source vCPUs from the destination VM
when doing intrahost migration. Grabbing the wrong vCPU not only hoses
the guest, it also crashes the host due to the VMSA pointer being left
NULL.

BUG: unable to handle page fault for address: ffffe38687000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO 6.5.0-smp--fff2e47e6c3b-next #151
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023
RIP: 0010:__free_pages+0x15/0xd0
RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000
RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000
R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
sev_free_vcpu+0xcb/0x110 [kvm_amd]
svm_vcpu_free+0x75/0xf0 [kvm_amd]
kvm_arch_vcpu_destroy+0x36/0x140 [kvm]
kvm_destroy_vcpus+0x67/0x100 [kvm]
kvm_arch_destroy_vm+0x161/0x1d0 [kvm]
kvm_put_kvm+0x276/0x560 [kvm]
kvm_vm_release+0x25/0x30 [kvm]
__fput+0x106/0x280
____fput+0x12/0x20
task_work_run+0x86/0xb0
do_exit+0x2e3/0x9c0
do_group_exit+0xb1/0xc0
__x64_sys_exit_group+0x1b/0x20
do_syscall_64+0x41/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
CR2: ffffe38687000000

Fixes: 6defa24d3b12 ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 389fbbec 15-Jun-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests

Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete)
instead of setting awaiting_iret_completion and waiting until the *next*
VM-Exit to unmask NMIs. The whole point of "NMI complete" is that the
guest is responsible for telling the hypervisor when it's safe to inject
an NMI, i.e. there's no need to wait. And because there's no IRET to
single-step, the next VM-Exit could be a long time coming, i.e. KVM could
incorrectly hold an NMI pending for far longer than what is required and
expected.

Opportunistically fix a stale reference to HF_IRET_MASK.

Fixes: 916b54a7688b ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"")
Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-9-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 90cbf6d9 15-Jun-2023 Alexey Kardashevskiy <aik@amd.com>

KVM: SEV-ES: Eliminate #DB intercept when DebugSwap enabled

Disable #DB for SEV-ES guests when DebugSwap is enabled. There is no point
in such intercept as KVM does not allow guest debug for SEV-ES guests.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-8-aik@amd.com
[sean: add comment as to why KVM disables #DB intercept iff DebugSwap=1]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# d1f85fbe 15-Jun-2023 Alexey Kardashevskiy <aik@amd.com>

KVM: SEV: Enable data breakpoints in SEV-ES

Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load most _guest_ debug
registers (except DR6/7), and KVM cannot manually context switch guest
DRs due the VMSA being encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB. Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialised but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 intercepts as KVM can't modify guest DR7, and intercepting
DR7 would completely defeat the purpose of enabling DebugSwap.

Make X86_FEATURE_DEBUG_SWAP appear in /proc/cpuinfo (by not adding "") to
let the operator know if the VM can debug.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-7-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# c2690b5f 15-Jun-2023 Alexey Kardashevskiy <aik@amd.com>

KVM: SVM/SEV/SEV-ES: Rework intercepts

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.

Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-6-aik@amd.com
[sean: drop comment about intercepting DR7]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 2837dd00 15-Jun-2023 Alexey Kardashevskiy <aik@amd.com>

KVM: SEV-ES: explicitly disable debug

SVM/SEV enable debug registers intercepts to skip swapping DRs
on entering/exiting the guest. When the guest is in control of
debug registers (vcpu->guest_debug == 0), there is an optimisation to
reduce the number of context switches: intercepts are cleared and
the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
on guest enter/exit.

The same code also executes for SEV-ES, however it has no effect as
- it always takes (vcpu->guest_debug == 0) branch;
- KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
- vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
with the values from _encrypted_ VMSA.

Be explicit about SEV-ES not supporting debug:
- return right away from dr_interception() and skip unnecessary processing;
- return an error right away from the KVM_SEV_LAUNCH_UPDATE_VMSA handler
if debugging was already enabled.
KVM_SET_GUEST_DEBUG are failing already after KVM_SEV_LAUNCH_UPDATE_VMSA
is finished due to vcpu->arch.guest_state_protected set to true.

Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
guest exit) to signify that SEV-ES won't hit that path.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-5-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# f8d808ed 15-Jun-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment about swap types

Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-4-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 29de732c 15-Jun-2023 Alexey Kardashevskiy <aik@amd.com>

KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries
keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is #GP intercept which init_vmcb() skips setting for SEV guests and
then sev_es_init_vmcb() needlessly clears it.

Remove the SEV check from init_vmcb(). Clear the #GP intercept in
sev_init_vmcb(). SEV-ES will use the SEV setting.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-3-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 63dbc67c 04-Aug-2023 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: remove ghcb variable declarations

To avoid possible time-of-check/time-of-use issues, the GHCB should
almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb
and sev_es_sync_from_ghcb. The only legitimate uses are to set the
exitinfo fields and to find the address of the scratch area embedded
in the ghcb. Accessing ghcb_usage also goes through svm->sev_es.ghcb
in sev_es_validate_vmgexit(), but that is because anyway the value is
not used.

Removing a shortcut variable that contains the value of svm->sev_es.ghcb
makes these cases a bit more verbose, but it limits the chance of someone
reading the ghcb by mistake.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7588dbce 03-Aug-2023 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: only access GHCB fields once

A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.

sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code(). Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.

To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.

Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.

Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.

Tested by booting an SEV-ES RHEL9 guest.

Fixes: CVE-2023-4155
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 4e15a0dd 03-Aug-2023 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: snapshot the GHCB before accessing it

Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
To avoid them, we would like to always snapshot the fields that are read in
sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.

This means:

- invoking sev_es_sync_from_ghcb() before any GHCB access, including before
sev_es_validate_vmgexit()

- snapshotting all fields including the valid bitmap and the sw_scratch field,
which are currently not caching anywhere.

The valid bitmap is the first thing to be copied out of the GHCB; then,
further accesses will use the copy in svm->sev_es.

Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 106ed2ca 06-Jun-2023 Sean Christopherson <seanjc@google.com>

KVM: SVM: WARN, but continue, if misc_cg_set_capacity() fails

WARN and continue if misc_cg_set_capacity() fails, as the only scenario
in which it can fail is if the specified resource is invalid, which should
never happen when CONFIG_KVM_AMD_SEV=y. Deliberately not bailing "fixes"
a theoretical bug where KVM would leak the ASID bitmaps on failure, which
again can't happen.

If the impossible should happen, the end result is effectively the same
with respect to SEV and SEV-ES (they are unusable), while continuing on
has the advantage of letting KVM load, i.e. userspace can still run
non-SEV guests.

Reported-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Link: https://lore.kernel.org/r/20230607004449.1421131-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 6d1bc975 22-May-2023 Alexander Mikhalitsyn <alexander@mihalicyn.com>

KVM: SVM: enhance info printk's in SEV init

Let's print available ASID ranges for SEV/SEV-ES guests.
This information can be useful for system administrator
to debug if SEV/SEV-ES fails to enable.

There are a few reasons.
SEV:
- NPT is disabled (module parameter)
- CPU lacks some features (sev, decodeassists)
- Maximum SEV ASID is 0

SEV-ES:
- mmio_caching is disabled (module parameter)
- CPU lacks sev_es feature
- Minimum SEV ASID value is 1 (can be adjusted in BIOS/UEFI)

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stéphane Graber <stgraber@ubuntu.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Link: https://lore.kernel.org/r/20230522161249.800829-3-aleksandr.mikhalitsyn@canonical.com
[sean: print '0' for min SEV-ES ASID if there are no available ASIDs]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# d2084fd8 14-May-2022 Al Viro <viro@zeniv.linux.org.uk>

SVM-SEV: convert the rest of fget() uses to fdget() in there

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ae7d45fb 10-Mar-2023 Mario Limonciello <mario.limonciello@amd.com>

crypto: ccp - Add a header for multiple drivers to use `__psp_pa`

The TEE subdriver for CCP, the amdtee driver and the i2c-designware-amdpsp
drivers all include `psp-sev.h` even though they don't use SEV
functionality.

Move the definition of `__psp_pa` into a common header to be included
by all of these drivers.

Reviewed-by: Jan Dabros <jsd@semihalf.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> # For the drivers/i2c/busses/i2c-designware-amdpsp.c
Acked-by: Sumit Garg <sumit.garg@linaro.org> # For TEE subsystem bits
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Sean Christopherson <seanjc@google.com> # KVM
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# f94f053a 07-Feb-2023 Peter Gonda <pgonda@google.com>

KVM: SVM: Fix potential overflow in SEV's send|receive_update_data()

KVM_SEV_SEND_UPDATE_DATA and KVM_SEV_RECEIVE_UPDATE_DATA have an integer
overflow issue. Params.guest_len and offset are both 32 bits wide, with a
large params.guest_len the check to confirm a page boundary is not
crossed can falsely pass:

/* Check if we are crossing the page boundary *
offset = params.guest_uaddr & (PAGE_SIZE - 1);
if ((params.guest_len + offset > PAGE_SIZE))

Add an additional check to confirm that params.guest_len itself is not
greater than PAGE_SIZE.

Note, this isn't a security concern as overflow can happen if and only if
params.guest_len is greater than 0xfffff000, and the FW spec says these
commands fail with lengths greater than 16KB, i.e. the PSP will detect
KVM's goof.

Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Reported-by: Andy Nguyen <theflow@google.com>
Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230207171354.4012821-1-pgonda@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# a31b531c 13-Jan-2023 Anish Ghulati <aghulati@google.com>

KVM: SVM: Account scratch allocations used to decrypt SEV guest memory

Account the temp/scratch allocation used to decrypt unaligned debug
accesses to SEV guest memory, the allocation is very much tied to the
target VM.

Reported-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Anish Ghulati <aghulati@google.com>
Link: https://lore.kernel.org/r/20230113220923.2834699-1-aghulati@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 8d20bd63 30-Nov-2022 Sean Christopherson <seanjc@google.com>

KVM: x86: Unify pr_fmt to use module name for all KVM modules

Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a8a12c00 28-Sep-2022 Zhao Liu <zhao1.liu@intel.com>

KVM: SVM: Replace kmap_atomic() with kmap_local_page()

The use of kmap_atomic() is being deprecated in favor of
kmap_local_page()[1].

The main difference between atomic and local mappings is that local
mappings don't disable page faults or preemption.

There're 2 reasons we can use kmap_local_page() here:
1. SEV is 64-bit only and kmap_local_page() doesn't disable migration in
this case, but here the function clflush_cache_range() uses CLFLUSHOPT
instruction to flush, and on x86 CLFLUSHOPT is not CPU-local and flushes
the page out of the entire cache hierarchy on all CPUs (APM volume 3,
chapter 3, CLFLUSHOPT). So there's no need to disable preemption to ensure
CPU-local.
2. clflush_cache_range() doesn't need to disable pagefault and the mapping
is still valid even if sleeps. This is also true for sched out/in when
preempted.

In addition, though kmap_local_page() is a thin wrapper around
page_address() on 64-bit, kmap_local_page() should still be used here in
preference to page_address() since page_address() isn't suitable to be used
in a generic function (like sev_clflush_pages()) where the page passed in
is not easy to determine the source of allocation. Keeping the kmap* API in
place means it can be used for things other than highmem mappings[2].

Therefore, sev_clflush_pages() is a function that should use
kmap_local_page() in place of kmap_atomic().

Convert the calls of kmap_atomic() / kunmap_atomic() to kmap_local_page() /
kunmap_local().

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
[2]: https://lore.kernel.org/lkml/5d667258-b58b-3d28-3609-e7914c99b31b@intel.com/

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220928092748.463631-1-zhao1.liu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# d08b4858 24-Oct-2022 Carlos Bilbao <carlos.bilbao@amd.com>

KVM: SVM: Name and check reserved fields with structs offset

Rename reserved fields on all structs in arch/x86/include/asm/svm.h
following their offset within the structs. Include compile time checks for
this in the same place where other BUILD_BUG_ON for the structs are.

This also solves that fields of struct sev_es_save_area are named by their
order of appearance, but right now they jump from reserved_5 to reserved_7.

Link: https://lkml.org/lkml/2022/10/22/376
Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com>
Message-Id: <20221024164448.203351-1-carlos.bilbao@amd.com>
[Use ASSERT_STRUCT_OFFSET + fix a couple wrong offsets. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 0bd8bd2f 04-Nov-2022 Peter Gonda <pgonda@google.com>

KVM: SVM: Only dump VMSA to klog at KERN_DEBUG level

Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses
KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's
nothing to continuing, and as a result the VMSA gets dumped when it
shouldn't.

The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the
previous log line has a newline. So switch from KERN_CONT to
print_hex_dump_debug().

Jarkko pointed this out in reference to the original patch. See:
https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/
print_hex_dump(KERN_DEBUG, ...) was pointed out there, but
print_hex_dump_debug() should similar.

Fixes: 6fac42f127b8 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog")
Signed-off-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Harald Hoyer <harald@profian.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Message-Id: <20221104142220.469452-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 73412dfe 09-Nov-2022 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: do not allocate struct svm_cpu_data dynamically

The svm_data percpu variable is a pointer, but it is allocated via
svm_hardware_setup() when KVM is loaded. Unlike hardware_enable()
this means that it is never NULL for the whole lifetime of KVM, and
static allocation does not waste any memory compared to the status quo.
It is also more efficient and more easily handled from assembly code,
so do it and don't look back.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 0c29397a 03-Aug-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Disable SEV-ES support if MMIO caching is disable

Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
generating #NPF(RSVD), which are reflected by the CPU into the guest as
a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
to the guest instruction stream or register state and so can't directly
emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
and those flavors of #NPF cause automatic VM-Exits, not #VC.

Adjust KVM's MMIO masks to account for the C-bit location prior to doing
SEV(-ES) setup, and document that dependency between adjusting the MMIO
SPTE mask and SEV(-ES) setup.

Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6fac42f1 27-Jul-2022 Jarkko Sakkinen <jarkko@kernel.org>

KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog

As Virtual Machine Save Area (VMSA) is essential in troubleshooting
attestation, dump it to the klog with the KERN_DEBUG level of priority.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Suggested-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Message-Id: <20220728050919.24113-1-jarkko@profian.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e5380f6d 13-Jun-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Hide SEV migration lockdep goo behind CONFIG_PROVE_LOCKING

Wrap the manipulation of @role and the manual mutex_{release,acquire}()
invocations in CONFIG_PROVE_LOCKING=y to squash a clang-15 warning. When
building with -Wunused-but-set-parameter and CONFIG_DEBUG_LOCK_ALLOC=n,
clang-15 seees there's no usage of @role in mutex_lock_killable_nested()
and yells. PROVE_LOCKING selects DEBUG_LOCK_ALLOC, and the only reason
KVM manipulates @role is to make PROVE_LOCKING happy.

To avoid true ugliness, use "i" and "j" to detect the first pass in the
loops; the "idx" field that's used by kvm_for_each_vcpu() is guaranteed
to be '0' on the first pass as it's simply the first entry in the vCPUs
XArray, which is fully KVM controlled. kvm_for_each_vcpu() passes '0'
for xa_for_each_range()'s "start", and xa_for_each_range() will not enter
the loop if there's no entry at '0'.

Fixes: 0c2c7c069285 ("KVM: SEV: Mark nested locking of vcpu->lock")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613214237.2538266-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6defa24d 23-Jun-2022 Peter Gonda <pgonda@google.com>

KVM: SEV: Init target VMCBs in sev_migrate_from

The target VMCBs during an intra-host migration need to correctly setup
for running SEV and SEV-ES guests. Add sev_init_vmcb() function and make
sev_es_init_vmcb() static. sev_init_vmcb() uses the now private function
to init SEV-ES guests VMCBs when needed.

Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration")
Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20220623173406.744645-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ebdec859 23-Jun-2022 Mingwei Zhang <mizhang@google.com>

KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()

Adding the accounting flag when allocating pages within the SEV function,
since these memory pages should belong to individual VM.

No functional change intended.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220623171858.2083637-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d22d2474 16-May-2022 Ashish Kalra <ashish.kalra@amd.com>

KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak

For some sev ioctl interfaces, the length parameter that is passed maybe
less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
that PSP firmware returns. In this case, kmalloc will allocate memory
that is the size of the input rather than the size of the data.
Since PSP firmware doesn't fully overwrite the allocated buffer, these
sev ioctl interface may return uninitialized kernel slab memory.

Reported-by: Andy Nguyen <theflow@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Peter Gonda <pgonda@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file")
Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command")
Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command")
Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 296d5a17 19-Apr-2022 Babu Moger <babu.moger@amd.com>

KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts

The TSC_AUX virtualization feature allows AMD SEV-ES guests to securely use
TSC_AUX (auxiliary time stamp counter data) in the RDTSCP and RDPID
instructions. The TSC_AUX value is set using the WRMSR instruction to the
TSC_AUX MSR (0xC0000103). It is read by the RDMSR, RDTSCP and RDPID
instructions. If the read/write of the TSC_AUX MSR is intercepted, then
RDTSCP and RDPID must also be intercepted when TSC_AUX virtualization
is present. However, the RDPID instruction can't be intercepted. This means
that when TSC_AUX virtualization is present, RDTSCP and TSC_AUX MSR
read/write must not be intercepted for SEV-ES (or SEV-SNP) guests.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <165040164424.1399644.13833277687385156344.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c24a950e 07-Apr-2022 Peter Gonda <pgonda@google.com>

KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

If an SEV-ES guest requests termination, exit to userspace with
KVM_EXIT_SYSTEM_EVENT and a dedicated SEV_TERM type instead of -EINVAL
so that userspace can take appropriate action.

See AMD's GHCB spec section '4.1.13 Termination Request' for more details.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>

Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220407210233.782250-1-pgonda@google.com>
[Add documentatino. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 3dd2775b 05-Apr-2022 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Create a separate mapping for the SEV-ES save area

The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is
different from the save area of a non SEV-ES/SEV-SNP guest.

This is the first step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new
save area definition where needed.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Link: https://lore.kernel.org/r/20220405182743.308853-1-brijesh.singh@amd.com


# 0c2c7c06 02-May-2022 Peter Gonda <pgonda@google.com>

KVM: SEV: Mark nested locking of vcpu->lock

svm_vm_migrate_from() uses sev_lock_vcpus_for_migration() to lock all
source and target vcpu->locks. Unfortunately there is an 8 subclass
limit, so a new subclass cannot be used for each vCPU. Instead maintain
ownership of the first vcpu's mutex.dep_map using a role specific
subclass: source vs target. Release the other vcpu's mutex.dep_maps.

Fixes: b56639318bb2b ("KVM: SEV: Add support for SEV intra host migration")
Reported-by: John Sperbeck<jsperbeck@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>

Message-Id: <20220502165807.529624-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 683412cc 20-Apr-2022 Mingwei Zhang <mizhang@google.com>

KVM: SEV: add cache flush to solve SEV cache incoherency issues

Flush the CPU caches when memory is reclaimed from an SEV guest (where
reclaim also includes it being unmapped from KVM's memslots). Due to lack
of coherency for SEV encrypted memory, failure to flush results in silent
data corruption if userspace is malicious/broken and doesn't ensure SEV
guest memory is properly pinned and unpinned.

Cache coherency is not enforced across the VM boundary in SEV (AMD APM
vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
VM guests have to be explicitly flushed on the host side. If a memory page
containing dirty confidential cachelines was released by VM and reallocated
to another user, the cachelines may corrupt the new user at a later time.

KVM takes a shortcut by assuming all confidential memory remain pinned
until the end of VM lifetime. Therefore, KVM does not flush cache at
mmu_notifier invalidation events. Because of this incorrect assumption and
the lack of cache flushing, malicous userspace can crash the host kernel:
creating a malicious VM and continuously allocates/releases unpinned
confidential memory pages when the VM is running.

Add cache flush operations to mmu_notifier operations to ensure that any
physical memory leaving the guest VM get flushed. In particular, hook
mmu_notifier_invalidate_range_start and mmu_notifier_release events and
flush cache accordingly. The hook after releasing the mmu lock to avoid
contention with other vCPUs.

Cc: stable@vger.kernel.org
Suggested-by: Sean Christpherson <seanjc@google.com>
Reported-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-4-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d45829b3 20-Apr-2022 Mingwei Zhang <mizhang@google.com>

KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs

Use clflush_cache_range() to flush the confidential memory when
SME_COHERENT is supported in AMD CPU. Cache flush is still needed since
SME_COHERENT only support cache invalidation at CPU side. All confidential
cache lines are still incoherent with DMA devices.

Cc: stable@vger.kerel.org

Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-3-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 4bbef7e8 20-Apr-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Simplify and harden helper to flush SEV guest page(s)

Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.

VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).

Signed-off-by: Sean Christopherson <seanjc@google.com>

Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c538dc79 08-Apr-2022 Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

KVM: SVM: Do not activate AVIC for SEV-enabled guest

Since current AVIC implementation cannot support encrypted memory,
inhibit AVIC for SEV-enabled guest.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220408133710.54275-1-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 00c22013 30-Mar-2022 Peter Gonda <pgonda@google.com>

KVM: SEV: Add cond_resched() to loop in sev_clflush_pages()

Add resched to avoid warning from sev_clflush_pages() with large number
of pages.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Message-Id: <20220330164306.2376085-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# aa9f5841 25-Feb-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors

Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...

As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.

And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.

KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.

Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.

Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b2125513 11-Feb-2022 Peter Gonda <pgonda@google.com>

KVM: SEV: Allow SEV intra-host migration of VM with mirrors

For SEV-ES VMs with mirrors to be intra-host migrated they need to be
able to migrate with the mirror. This is due to that fact that all VMSAs
need to be added into the VM with LAUNCH_UPDATE_VMSA before
lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to
keep the mirror VMs VMSAs during migration.

Adds a list of mirror VMs for the original VM iterate through during its
migration. During the iteration the owner pointers can be updated from
the source to the destination. This fixes the ASID leaking issue which
caused the blocking of migration of VMs with mirrors.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20220211193634.3183388-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 23e5092b 27-Jan-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names

Massage SVM's implementation names that still diverge from kvm_x86_ops to
allow for wiring up all SVM-defined functions via kvm-x86-ops.h.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-22-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 559c7c75 27-Jan-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Rename SEV implemenations to conform to kvm_x86_ops hooks

Rename svm_vm_copy_asid_from() and svm_vm_migrate_from() to conform to
the names used by kvm_x86_ops, and opportunistically use "sev" instead of
"svm" to more precisely identify the role of the hooks.

svm_vm_copy_asid_from() in particular was poorly named as the function
does much more than simply copy the ASID.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 03d004cd 27-Jan-2022 Sean Christopherson <seanjc@google.com>

KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooks

Use slightly more verbose names for the so called "memory encrypt",
a.k.a. "mem enc", kvm_x86_ops hooks to bridge the gap between the current
super short kvm_x86_ops names and SVM's more verbose, but non-conforming
names. This is a step toward using kvm-x86-ops.h with KVM_X86_CVM_OP()
to fill svm_x86_ops.

Opportunistically rename mem_enc_op() to mem_enc_ioctl() to better
reflect its true nature, as it really is a full fledged ioctl() of its
own. Ideally, the hook would be named confidential_vm_ioctl() or so, as
the ioctl() is a gateway to more than just memory encryption, and because
its underlying purpose to support Confidential VMs, which can be provided
without memory encryption, e.g. if the TCB of the guest includes the host
kernel but not host userspace, or by isolation in hardware without
encrypting memory. But, diverging from KVM_MEMORY_ENCRYPT_OP even
further is undeseriable, and short of creating alises for all related
ioctl()s, which introduces a different flavor of divergence, KVM is stuck
with the nomenclature.

Defer renaming SVM's functions to a future commit as there are additional
changes needed to make SVM fully conforming and to match reality (looking
at you, svm_vm_copy_asid_from()).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 068f7ea6 25-Jan-2022 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch

KVM performs the VMSAVE to the host save area for both regular and SEV-ES
guests, so hoist it up to svm_prepare_guest_switch. And because
sev_es_prepare_guest_switch does not really need to know the details
of struct svm_cpu_data *, just pass it the pointer to the host save area
inside the HSAVE page.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c532f290 19-Jan-2022 Sean Christopherson <seanjc@google.com>

KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support

Add a sanity check on DECODEASSIST being support if SEV is supported, as
KVM cannot read guest private memory and thus relies on the CPU to
provide the instruction byte stream on #NPF for emulation. The intent of
the check is to document the dependency, it should never fail in practice
as producing hardware that supports SEV but not DECODEASSISTS would be
non-sensical.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 46808a4c 16-Nov-2021 Marc Zyngier <maz@kernel.org>

KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index

Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 597cb796 04-Jan-2022 Wanpeng Li <wanpengli@tencent.com>

KVM: SEV: Mark nested locking of kvm->lock

Both source and dest vms' kvm->locks are held in sev_lock_two_vms.
Mark one with a different subtype to avoid false positives from lockdep.

Fixes: c9d61dcb0bc26 (KVM: SEV: accept signals in sev_lock_two_vms)
Reported-by: Yiru Xu <xyru1999@gmail.com>
Tested-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1641364863-26331-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ad5b3532 01-Dec-2021 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure

Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
exit code or exit parameters fails.

The VMGEXIT instruction can be issued from userspace, even though
userspace (likely) can't update the GHCB. To prevent userspace from being
able to kill the guest, return an error through the GHCB when validation
fails rather than terminating the guest. For cases where the GHCB can't be
updated (e.g. the GHCB can't be mapped, etc.), just return back to the
guest.

The new error codes are documented in the lasest update to the GHCB
specification.

Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a655276a 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary

Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so
that KVM falls back to __vmalloc() if physically contiguous memory isn't
available. The buffer is purely a KVM software construct, i.e. there's
no need for it to be physically contiguous.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 75236f5f 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails

Return appropriate error codes if setting up the GHCB scratch area for an
SEV-ES guest fails. In particular, returning -EINVAL instead of -ENOMEM
when allocating the kernel buffer could be confusing as userspace would
likely suspect a guest issue.

Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c9d61dcb 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: accept signals in sev_lock_two_vms

Generally, kvm->lock is not taken for a long time, but
sev_lock_two_vms is different: it takes vCPU locks
inside, so userspace can hold it back just by calling
a vCPU ioctl. Play it safe and use mutex_lock_killable.

Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 10a37929 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: do not take kvm->lock when destroying

Taking the lock is useless since there are no other references,
and there are already accesses (e.g. to sev->enc_context_owner)
that do not take it. So get rid of it.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 17d44a96 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: Prohibit migration of a VM that has mirrors

VMs that mirror an encryption context rely on the owner to keep the
ASID allocated. Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
would cause a dangling ASID:

1. copy context from A to B (gets ref to A)
2. move context from A to L (moves ASID from A to L)
3. close L (releases ASID from L, B still references it)

The right way to do the handoff instead is to create a fresh mirror VM
on the destination first:

1. copy context from A to B (gets ref to A)
[later] 2. close B (releases ref to A)
3. move context from A to L (moves ASID from A to L)
4. copy context from L to M

So, catch the situation by adding a count of how many VMs are
mirroring this one's encryption context.

Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration")
Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# bf42b02b 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked

Now that we have a facility to lock two VMs with deadlock
protection, use it for the creation of mirror VMs as well. One of
COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst)
would always fail, so the combination is nonsensical and it is okay to
return -EBUSY if it is attempted.

This sidesteps the question of what happens if a VM is
MOVE_ENC_CONTEXT_FROM'd at the same time as it is
COPY_ENC_CONTEXT_FROM'd: the locking prevents that from
happening.

Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 642525e3 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM

Allow intra-host migration of a mirror VM; the destination VM will be
a mirror of the same ASID as the source.

Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-8-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 2b347a38 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: initialize regions_list of a mirror VM

This was broken before the introduction of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM,
but technically harmless because the region list was unused for a mirror
VM. However, it is untidy and it now causes a NULL pointer access when
attempting to move the encryption context of a mirror VM.

Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context")
Message-Id: <20211123005036.2954379-7-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 501b580c 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM

Encapsulate the handling of the migration_in_progress flag for both VMs in
two functions sev_lock_two_vms and sev_unlock_two_vms. It does not matter
if KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM locks the destination struct kvm a bit
later, and this change 1) keeps the cleanup chain of labels smaller 2)
makes it possible for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM to reuse the logic.

Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 4674164f 22-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: do not use list_replace_init on an empty list

list_replace_init cannot be used if the source is an empty list,
because "new->next->prev = new" will overwrite "old->next":

new old
prev = new, next = new prev = old, next = old
new->next = old->next prev = new, next = old prev = old, next = old
new->next->prev = new prev = new, next = old prev = old, next = new
new->prev = old->prev prev = old, next = old prev = old, next = old
new->next->prev = new prev = old, next = old prev = new, next = new

The desired outcome instead would be to leave both old and new the same
as they were (two empty circular lists). Use list_cut_before, which
already has the necessary check and is documented to discard the
previous contents of the list that will hold the result.

Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8e38e96a 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()

Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing
a typo and making it obvious that the result is a boolean where
false means "not allowed".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ea410ef4 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Drop a redundant setting of sev->asid during initialization

Remove a fully redundant write to sev->asid during SEV/SEV-ES guest
initialization. The ASID is set a few lines earlier prior to the call to
sev_platform_init(), which doesn't take "sev" as a param, i.e. can't
muck with the ASID barring some truly magical behind-the-scenes code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a41fb26e 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()

Set sev_info.active during SEV/SEV-ES activation before calling any code
that can potentially consume sev_info.es_active, e.g. set "active" and
"es_active" as a pair immediately after the initial sanity checks. KVM
generally expects that es_active can be true if and only if active is
true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it
doesn't get a false negative. This will allow WARNing in sev_es_guest()
if the VM is tagged as SEV-ES but not SEV.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 79b11142 09-Nov-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs

Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs.
KVM relies on SEV activation to occur before vCPUs are created, e.g. to
set VMCB flags and intercepts correctly.

Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211109215101.2211373-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 501cfe06 12-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from

Use the same cleanup code independent of whether the cgroup to be
uncharged and unref'd is the source or the destination cgroup. Use a
bool to track whether the destination cgroup has been charged, which also
fixes a bug in the error case: the destination cgroup must be uncharged
only if it does not match the source.

Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 0b020f5a 21-Oct-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Add support for SEV-ES intra host migration

For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
and other SEV-ES info needs to be preserved along with the guest's
memory.

Signed-off-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20211021174303.385706-4-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b5663931 21-Oct-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Add support for SEV intra host migration

For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20211021174303.385706-3-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 91b692a0 11-Nov-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: provide helpers to charge/uncharge misc_cg

Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge. The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b67a4cc3 21-Oct-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Refactor out sev_es_state struct

Move SEV-ES vCPU metadata into new sev_es_state struct from vcpu_svm.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20211021174303.385706-2-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9ebe530b 20-Sep-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Move RESET emulation to svm_vcpu_reset()

Move RESET emulation for SVM vCPUs to svm_vcpu_reset(), and drop an extra
init_vmcb() from svm_create_vcpu() in the process. Hopefully KVM will
someday expose a dedicated RESET ioctl(), and in the meantime separating
"create" from "RESET" is a nice cleanup.

Keep the call to svm_switch_vmcb() so that misuse of svm->vmcb at worst
breaks the guest, e.g. premature accesses doesn't cause a NULL pointer
dereference.

Cc: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210921000303.400537-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d9d005f3 14-Oct-2021 Thomas Gleixner <tglx@linutronix.de>

x86/fpu: Move mxcsr related code to core

No need to expose that to code which only needs the XCR0 accessors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011539.740012411@linutronix.de


# 9b0971ca 24-Oct-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV-ES: fix another issue with string I/O VMGEXITs

If the guest requests string I/O from the hypervisor via VMGEXIT,
SW_EXITINFO2 will contain the REP count. However, sev_es_string_io
was incorrectly treating it as the size of the GHCB buffer in
bytes.

This fixes the "outsw" test in the experimental SEV tests of
kvm-unit-tests.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Marc Orr <marcorr@google.com>
Tested-by: Marc Orr <marcorr@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c8c340a9 14-Sep-2021 Masahiro Kozuka <masa.koz@kozuka.jp>

KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA

Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the
PSP encrypts the data with the guest's key when writing to guest memory.
If the target memory was not previously encrypted, the cache may contain
dirty, unecrypted data that will persist on non-coherent systems.

Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Masahiro Kozuka <masa.koz@kozuka.jp>
[sean: converted bug report to changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210914210951.2994260-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# baa1e5ca 15-Oct-2021 Peter Gonda <pgonda@google.com>

KVM: SEV-ES: Set guest_state_protected after VMSA update

The refactoring in commit bb18a6777465 ("KVM: SEV: Acquire
vcpu mutex when updating VMSA") left behind the assignment to
svm->vcpu.arch.guest_state_protected; add it back.

Signed-off-by: Peter Gonda <pgonda@google.com>
[Delta between v2 and v3 of Peter's patch, which had already been
committed; the commit message is my own. - Paolo]
Fixes: bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 019057bd 12-Oct-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV-ES: fix length of string I/O

The size of the data in the scratch buffer is not divided by the size of
each port I/O operation, so vcpu->arch.pio.count ends up being larger
than it should be by a factor of size.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 5b92b6ca 21-Sep-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Allow some commands for mirror VM

A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs and have them measured, and their VMSAs encrypted. Without
this change, it is impossible to have mirror VMs as part of SEV-ES VMs.

Also allow the guest status check and debugging commands since they do
not change any guest state.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <20210921150345.2221634-3-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# f43c887c 21-Sep-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Update svm_vm_copy_asid_from for SEV-ES

For mirroring SEV-ES the mirror VM will need more then just the ASID.
The FD and the handle are required to all the mirror to call psp
commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs' VMSAs for SEV-ES.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <20210921150345.2221634-2-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 50c03801 14-Sep-2021 Sean Christopherson <seanjc@google.com>

KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATA

Require the target guest page to be writable when pinning memory for
RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory:

The result is then encrypted with GCTX.VEK and written to the memory
pointed to by GUEST_PADDR field.

Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210914210951.2994260-2-seanjc@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# f1815e0a 12-Sep-2021 Mingwei Zhang <mizhang@google.com>

KVM: SVM: fix missing sev_decommission in sev_receive_start

DECOMMISSION the current SEV context if binding an ASID fails after
RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest
context and thus needs to be paired with DECOMMISSION:

The RECEIVE_START command is the only command other than the LAUNCH_START
command that generates a new guest context and guest handle.

The missing DECOMMISSION can result in subsequent SEV launch failures,
as the firmware leaks memory and might not able to allocate more SEV
guest contexts in the future.

Note, LAUNCH_START suffered the same bug, but was previously fixed by
commit 934002cd660b ("KVM: SVM: Call SEV Guest Decommission if ASID
binding fails").

Cc: Alper Gun <alpergun@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: David Rienjes <rientjes@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: John Allen <john.allen@amd.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Marc Orr <marcorr@google.com>
Acked-by: Brijesh Singh <brijesh.singh@amd.com>
Fixes: af43cbbf954b ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210912181815.3899316-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# bb18a677 15-Sep-2021 Peter Gonda <pgonda@google.com>

KVM: SEV: Acquire vcpu mutex when updating VMSA

The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
therefore should not be performed concurrently with any VCPU ioctl
that might cause KVM or the processor to use the same data.

Adds vcpu mutex guard to the VMSA updating code. Refactors out
__sev_launch_update_vmsa() function to deal with per vCPU parts
of sev_launch_update_vmsa().

Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20210915171755.3773766-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ad0577c3 09-Aug-2021 Sean Christopherson <seanjc@google.com>

KVM: x86: Kill off __ex() and __kvm_handle_fault_on_reboot()

Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all
VMX and SVM instructions use asm goto to handle the fault (or in the
case of VMREAD, completely custom logic). Drop kvm_spurious_fault()'s
asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only
flow that invoked it from assembly code.

Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210809173955.1710866-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d0f9f826 13-Jul-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Stuff save->dr6 at during VMSA sync, not at RESET/INIT

Move code to stuff vmcb->save.dr6 to its architectural init value from
svm_vcpu_reset() into sev_es_sync_vmsa(). Except for protected guests,
a.k.a. SEV-ES guests, vmcb->save.dr6 is set during VM-Enter, i.e. the
extra write is unnecessary. For SEV-ES, stuffing save->dr6 handles a
theoretical case where the VMSA could be encrypted before the first
KVM_RUN.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-33-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# bb2baeb2 02-Aug-2021 Mingwei Zhang <mizhang@google.com>

KVM: SVM: improve the code readability for ASID management

KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
because it is never used by VM. Thus, in existing code, ASID value and its
bitmap postion always has an 'offset-by-1' relationship.

Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
[min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.

Existing code mixes the usage of ASID value and its bitmap position by
using the same variable called 'min_asid'.

Fix the min_asid usage: ensure that its usage is consistent with its name;
allocate extra size for ASID 0 to ensure that each ASID has the same value
with its bitmap position. Add comments on ASID bitmap allocation to clarify
the size change.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Marc Orr <marcorr@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Alper Gun <alpergun@google.com>
Cc: Dionna Glaze <dionnaglaze@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Message-Id: <20210802180903.159381-1-mizhang@google.com>
[Fix up sev_asid_free to also index by ASID, as suggested by Sean
Christopherson, and use nr_asids in sev_cpu_init. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 179c6c27 03-Aug-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB

Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.

Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN. However, prior to commit
8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable. Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.

Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c7a1b2b6 06-May-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities

Use IS_ERR() instead of checking for a NULL pointer when querying for
sev_pin_memory() failures. sev_pin_memory() always returns an error code
cast to a pointer, or a valid pointer; it never returns NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210506175826.2166383-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b4a69392 06-May-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails

Return -EFAULT if copy_to_user() fails; if accessing user memory faults,
copy_to_user() returns the number of bytes remaining, not an error code.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210506175826.2166383-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 784a4661 23-Jun-2021 Dave Hansen <dave.hansen@linux.intel.com>

x86/pkeys: Move read_pkru() and write_pkru()

write_pkru() was originally used just to write to the PKRU register. It
was mercifully short and sweet and was not out of place in pgtable.h with
some other pkey-related code.

But, later work included a requirement to also modify the task XSAVE
buffer when updating the register. This really is more related to the
XSAVE architecture than to paging.

Move the read/write_pkru() to asm/pkru.h. pgtable.h won't miss them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.de


# 934002cd 10-Jun-2021 Alper Gun <alpergun@google.com>

KVM: SVM: Call SEV Guest Decommission if ASID binding fails

Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.

The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.

Cc: stable@vger.kernel.org
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Alper Gun <alpergun@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210610174604.2554090-1-alpergun@google.com>


# 4f13d471 07-Jun-2021 Ashish Kalra <ashish.kalra@amd.com>

KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit 238eca821cee

Commit 238eca821cee ("KVM: SVM: Allocate SEV command structures on local stack")
uses the local stack to allocate the structures used to communicate with the PSP,
which were earlier being kzalloced. This breaks SEV live migration for
computing the SEND_START session length and SEND_UPDATE_DATA query length as
session_len and trans_len and hdr_len fields are not zeroed respectively for
the above commands before issuing the SEV Firmware API call, hence the
firmware returns incorrect session length and update data header or trans length.

Also the SEV Firmware API returns SEV_RET_INVALID_LEN firmware error
for these length query API calls, and the return value and the
firmware error needs to be passed to the userspace as it is, so
need to remove the return check in the KVM code.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <20210607061532.27459-1-Ashish.Kalra@amd.com>
Fixes: 238eca821cee ("KVM: SVM: Allocate SEV command structures on local stack")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ce7ea0cf 06-May-2021 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Move GHCB unmapping to fix RCU warning

When an SEV-ES guest is running, the GHCB is unmapped as part of the
vCPU run support. However, kvm_vcpu_unmap() triggers an RCU dereference
warning with CONFIG_PROVE_LOCKING=y because the SRCU lock is released
before invoking the vCPU run support.

Move the GHCB unmapping into the prepare_guest_switch callback, which is
invoked while still holding the SRCU lock, eliminating the RCU dereference
warning.

Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b2f9b79d15166f2c3e4375c0d9bc3268b7696455.1620332081.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 368340a3 06-May-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers

Invert the user pointer params for SEV's helpers for encrypting and
decrypting guest memory so that they take a pointer and cast to an
unsigned long as necessary, as opposed to doing the opposite. Tagging a
non-pointer as __user is confusing and weird since a cast of some form
needs to occur to actually access the user data. This also fixes Sparse
warnings triggered by directly consuming the unsigned longs, which are
"noderef" due to the __user tag.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210506231542.2331138-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8899a5fc 30-Apr-2021 Colin Ian King <colin.king@canonical.com>

KVM: x86: Fix potential fput on a null source_kvm_file

The fget can potentially return null, so the fput on the error return
path can cause a null pointer dereference. Fix this by checking for
a null source_kvm_file before doing a fput.

Addresses-Coverity: ("Dereference null return")
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Message-Id: <20210430170303.131924-1-colin.king@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 469bb32b 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Skip SEV cache flush if no ASIDs have been used

Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs
waiting to be reclaimed, e.g. if SEV was never used. This "fixes" an
issue where the DF_FLUSH fails during hardware teardown if the original
SEV_INIT failed. Ideally, SEV wouldn't be marked as enabled in KVM if
SEV_INIT fails, but that's a problem for another day.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 82b7ae04 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()

Remove the forward declaration of sev_flush_asids(), which is only a few
lines above the function itself.

No functional change intended.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a5c1c5aa 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Drop redundant svm_sev_enabled() helper

Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or
in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled().
This effectively replaces checks against a valid max_sev_asid with checks
against sev_enabled. sev_enabled is forced off by sev_hardware_setup()
if max_sev_asid is invalid, all call sites are guaranteed to run after
sev_hardware_setup(), and all of the checks care about SEV being fully
enabled (as opposed to intentionally handling the scenario where
max_sev_asid is valid but SEV enabling fails due to OOM).

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b95c221c 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Move SEV VMCB tracking allocation to sev.c

Move the allocation of the SEV VMCB array to sev.c to help pave the way
toward encapsulating SEV enabling wholly within sev.c.

No functional change intended.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8cb756b7 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()

Query max_sev_asid directly after setting it instead of bouncing through
its wrapper, svm_sev_enabled(). Using the wrapper is unnecessary
obfuscation.

No functional change intended.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6c2c7bf5 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)

Enable the 'sev' and 'sev_es' module params by default instead of having
them conditioned on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT. The extra
Kconfig is pointless as KVM SEV/SEV-ES support is already controlled via
CONFIG_KVM_AMD_SEV, and CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT has the
unfortunate side effect of enabling all the SEV-ES _guest_ code due to
it being dependent on CONFIG_AMD_MEM_ENCRYPT=y.

Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a479c334 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y

Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef
out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n. This kills
three birds at once:

- Makes sev_enabled and sev_es_enabled off by default if
CONFIG_KVM_AMD_SEV=n. Previously, they could be on by default if
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV
support.

- Hides the sev and sev_es modules params when CONFIG_KVM_AMD_SEV=n.

- Resolves a false positive -Wnonnull in __sev_recycle_asids() that is
currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV)
check in svm_sev_enabled(), which will be dropped in a future patch.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8d364a07 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables

Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to
better align with other KVM terminology, and to avoid pseudo-shadowing
when the variables are moved to sev.c in a future patch ('sev' is often
used for local struct kvm_sev_info pointers.

No functional change intended.

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d9db0fd6 21-Apr-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features

Add a reverse-CPUID entry for the memory encryption word, 0x8000001F.EAX,
and use it to override the supported CPUID flags reported to userspace.
Masking the reported CPUID flags avoids over-reporting KVM support, e.g.
without the mask a SEV-SNP capable CPU may incorrectly advertise SNP
support to userspace.

Clear SEV/SEV-ES if their corresponding module parameters are disabled,
and clear the memory encryption leaf completely if SEV is not fully
supported in KVM. Advertise SME_COHERENT in addition to SEV and SEV-ES,
as the guest can use SME_COHERENT to avoid CLFLUSH operations.

Explicitly omit SME and VM_PAGE_FLUSH from the reporting. These features
are used by KVM, but are not exposed to the guest, e.g. guest access to
related MSRs will fault.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e8126bda 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Move SEV module params/variables to sev.c

Unconditionally invoke sev_hardware_setup() when configuring SVM and
handle clearing the module params/variable 'sev' and 'sev_es' in
sev_hardware_setup(). This allows making said variables static within
sev.c and reduces the odds of a collision with guest code, e.g. the guest
side of things has already laid claim to 'sev_enabled'.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# f31b88b3 21-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails

Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
KVM will unnecessarily keep the bitmap when SEV is not fully enabled.

Freeing the page is also necessary to avoid introducing a bug when a
future patch eliminates svm_sev_enabled() in favor of using the global
'sev' flag directly. While sev_hardware_enabled() checks max_sev_asid,
which is true even if KVM setup fails, 'sev' will be true if and only
if KVM setup fully succeeds.

Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 238eca82 06-Apr-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Allocate SEV command structures on local stack

Use the local stack to "allocate" the structures used to communicate with
the PSP. The largest struct used by KVM, sev_data_launch_secret, clocks
in at 52 bytes, well within the realm of reasonable stack usage. The
smallest structs are a mere 4 bytes, i.e. the pointer for the allocation
is larger than the allocation itself.

Now that the PSP driver plays nice with vmalloc pointers, putting the
data on a virtually mapped stack (CONFIG_VMAP_STACK=y) will not cause
explosions.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210406224952.4177376-9-seanjc@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
[Apply same treatment to PSP migration commands. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6a443def 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command

The command finalize the guest receiving process and make the SEV guest
ready for the execution.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <d08914dc259644de94e29b51c3b68a13286fc5a3.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 15fb7de1 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command

The command is used for copying the incoming buffer into the
SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# af43cbbf 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add support for KVM_SEV_RECEIVE_START command

The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 5569e2e7 20-Apr-2021 Steve Rutherford <srutherford@google.com>

KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.

Reviewed-by: Nathan Tempelman <natet@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Message-Id: <20210412194408.2458827-1-srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# fddecf6a 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add KVM_SEV_SEND_FINISH command

The command is used to finailize the encryption context created with
KVM_SEV_SEND_START command.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <5082bd6a8539d24bc55a1dd63a1b341245bb168f.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d3d1af85 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add KVM_SEND_UPDATE_DATA command

The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by : Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 4cfdd47d 15-Apr-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM: SVM: Add KVM_SEV SEND_START command

The command is used to create an outgoing SEV guest encryption context.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 54526d1f 08-Apr-2021 Nathan Tempelman <natet@google.com>

KVM: x86: Support KVM VMs sharing SEV context

Add a capability for userspace to mirror SEV encryption context from
one vm to another. On our side, this is intended to support a
Migration Helper vCPU, but it can also be used generically to support
other in-guest workloads scheduled by the host. The intention is for
the primary guest and the mirror to have nearly identical memslots.

The primary benefits of this are that:
1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
can't accidentally clobber each other.
2) The VMs can have different memory-views, which is necessary for post-copy
migration (the migration vCPUs on the target need to read and write to
pages, when the primary guest would VMEXIT).

This does not change the threat model for AMD SEV. Any memory involved
is still owned by the primary guest and its initial state is still
attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
to circumvent SEV, they could achieve the same effect by simply attaching
a vCPU to the primary VM.
This patch deliberately leaves userspace in charge of the memslots for the
mirror, as it already has the power to mess with them in the primary guest.

This patch does not support SEV-ES (much less SNP), as it does not
handle handing off attested VMSAs to the mirror.

For additional context, we need a Migration Helper because SEV PSP
migration is far too slow for our live migration on its own. Using
an in-guest migrator lets us speed this up significantly.

Signed-off-by: Nathan Tempelman <natet@google.com>
Message-Id: <20210408223214.2582277-1-natet@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a3ba26ec 09-Apr-2021 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Make sure GHCB is mapped before updating

Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.

The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.

The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.

Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# eba04b20 30-Mar-2021 Sean Christopherson <seanjc@google.com>

KVM: x86: Account a variety of miscellaneous allocations

Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.

Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8727906f 30-Mar-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created

Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one
or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES
prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV
enabled, and svm_create_vcpu() needs to allocate the VMSA. At best,
creating vCPUs before SEV/SEV-ES init will lead to unexpected errors
and/or behavior, and at worst it will crash the host, e.g.
sev_launch_update_vmsa() will dereference a null svm->vmsa pointer.

Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9fa1521d 30-Mar-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completes

Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If
the command fails, e.g. because SEV is already active or there are no
available ASIDs, then es_active will be left set even though the VM is
not fully SEV-ES capable.

Refactor the code so that "es_active" is passed on the stack instead of
being prematurely shoved into sev_info, both to avoid having to unwind
sev_info and so that it's more obvious what actually consumes es_active
in sev_guest_init() and its helpers.

Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c36b16d2 30-Mar-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUs

Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting
VMSAs for SEV, which effectively switches to use online_vcpus instead of
created_vcpus. This fixes a possible null-pointer dereference as
created_vcpus does not guarantee a vCPU exists, since it is updated at
the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the
bulk of vCPU creation to run in parallel, while still correctly
restricting the max number of max vCPUs.

Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210331031936.2495277-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 63129754 02-Mar-2021 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)

Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to
allow directly invoking common x86 exit handlers (in a future patch).
Opportunistically convert an absurd number of instances of 'svm->vcpu'
to direct uses of 'vcpu' to avoid pointless casting.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7aef27f0 29-Mar-2021 Vipin Sharma <vipinsh@google.com>

svm/sev: Register SEV and SEV-ES ASIDs to the misc controller

Secure Encrypted Virtualization (SEV) and Secure Encrypted
Virtualization - Encrypted State (SEV-ES) ASIDs are used to encrypt KVMs
on AMD platform. These ASIDs are available in the limited quantities on
a host.

Register their capacity and usage to the misc controller for tracking
via cgroups.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# d9f6e12f 18-Mar-2021 Ingo Molnar <mingo@kernel.org>

x86: Fix various typos in comments

Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org


# a7fc06dd 02-Feb-2021 Michael Roth <michael.roth@amd.com>

KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup

Currently we save host state like user-visible host MSRs, and do some
initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO
in svm_vcpu_load(). Defer this until just before we enter the guest by
moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to
how it is done for the VMX implementation.

Additionally, since handling of saving/restoring host user MSRs is the
same both with/without SEV-ES enabled, move that handling to common
code.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20210202190126.2185715-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 553cc15f 02-Feb-2021 Michael Roth <michael.roth@amd.com>

KVM: SVM: remove uneeded fields from host_save_users_msrs

Now that the set of host user MSRs that need to be individually
saved/restored are the same with/without SEV-ES, we can drop the
.sev_es_restored flag and just iterate through the list unconditionally
for both cases. A subsequent patch can then move these loops to a
common path.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20210202190126.2185715-3-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 35a78319 30-Dec-2020 Sean Christopherson <seanjc@google.com>

KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions

Add svm_asm*() macros, a la the existing vmx_asm*() macros, to handle
faults on SVM instructions instead of using the generic __ex(), a.k.a.
__kvm_handle_fault_on_reboot(). Using asm goto generates slightly
better code as it eliminates the in-line JMP+CALL sequences that are
needed by __kvm_handle_fault_on_reboot() to avoid triggering BUG()
from fixup (which generates bad stack traces).

Using SVM specific macros also drops the last user of __ex() and the
the last asm linkage to kvm_spurious_fault(), and adds a helper for
VMSAVE, which may gain an addition call site in the future (as part
of optimizing the SVM context switching).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201231002702.2223707-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 2c07ded0 04-Jan-2021 Brijesh Singh <brijesh.singh@amd.com>

KVM/SVM: add support for SEV attestation command

The SEV FW version >= 0.23 added a new command that can be used to query
the attestation report containing the SHA-256 digest of the guest memory
encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
sign the report with the Platform Endorsement Key (PEK).

See the SEV FW API spec section 6.8 for more details.

Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
used to get the SHA-256 digest. The main difference between the
KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter
can be called while the guest is running and the measurement value is
signed with PEK.

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: John Allen <john.allen@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 19a23da5 27-Jan-2021 Peter Gonda <pgonda@google.com>

Fix unsynchronized access to sev members through svm_register_enc_region

Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.

Add a lockdep assertion to help prevent future regressions.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")
Signed-off-by: Peter Gonda <pgonda@google.com>

V2
- Fix up patch description
- Correct file paths svm.c -> sev.c
- Add unlock of kvm->lock on sev_pin_memory error

V1
- https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/

Message-Id: <20210127161524.2832400-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 25009140 22-Jan-2021 Sean Christopherson <seanjc@google.com>

KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty. The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

The guest and hypervisor can supply additional state if desired but
must not rely on that additional state being provided.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 647daca2 04-Jan-2021 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for booting APs in an SEV-ES guest

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.

Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.

To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# f65cf84e 18-Dec-2020 Nathan Chancellor <nathan@kernel.org>

KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load

When using LLVM's integrated assembler (LLVM_IAS=1) while building
x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build
error occurs:

$ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o
arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction
asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
^
arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex'
#define __ex(x) __kvm_handle_fault_on_reboot(x)
^
./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot'
"666: \n\t" \
^
<inline asm>:2:2: note: instantiated into assembly here
vmsave
^
1 error generated.

This happens because LLVM currently does not support calling vmsave
without the fixed register operand (%rax for 64-bit and %eax for
32-bit). This will be fixed in LLVM 12 but the kernel currently supports
LLVM 10.0.1 and newer so this needs to be handled.

Add the proper register using the _ASM_AX macro, which matches the
vmsave call in vmenter.S.

Fixes: 861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
Link: https://reviews.llvm.org/D93524
Link: https://github.com/ClangBuiltLinux/linux/issues/1216
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d45f89f7 16-Dec-2020 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: fix 32-bit compilation

VCPU_REGS_R8...VCPU_REGS_R15 are not defined on 32-bit x86,
so cull them from the synchronization of the VMSA.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8640ca58 14-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting

The GHCB specification requires the hypervisor to save the address of an
AP Jump Table so that, for example, vCPUs that have been parked by UEFI
can be started by the OS. Provide support for the AP Jump Table set/get
exit code.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ad73109a 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Provide support to launch and run an SEV-ES guest

An SEV-ES guest is started by invoking a new SEV initialization ioctl,
KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is
used to drive the appropriate ASID allocation, VMSA encryption, etc.

Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted
and measured. This is done using the LAUNCH_UPDATE_VMSA command after all
calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE
has been performed. In order to establish the encrypted VMSA, the current
(traditional) VMSA and the GPRs are synced to the page that will hold the
encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then
marked as having protected guest state.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e9643245adb809caf3a87c09997926d2f3d6ff41.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 86137773 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Provide support for SEV-ES vCPU loading

An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
hardware will restore certain registers on VMEXIT, but not save them on
VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
following changes:

General vCPU load changes:
- During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
save the current values of XCR0, XSS and PKRU to the per-CPU SVM save
area as these registers will be restored on VMEXIT.

General vCPU put changes:
- Do not attempt to restore registers that SEV-ES hardware has already
restored on VMEXIT.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 376c6d28 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Provide support for SEV-ES vCPU creation/loading

An SEV-ES vCPU requires additional VMCB initialization requirements for
vCPU creation and vCPU load/put requirements. This includes:

General VMCB initialization changes:
- Set a VMCB control bit to enable SEV-ES support on the vCPU.
- Set the VMCB encrypted VM save area address.
- CRx registers are part of the encrypted register state and cannot be
updated. Remove the CRx register read and write intercepts and replace
them with CRx register write traps to track the CRx register values.
- Certain MSR values are part of the encrypted register state and cannot
be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.).
- Remove the #GP intercept (no support for "enable_vmware_backdoor").
- Remove the XSETBV intercept since the hypervisor cannot modify XCR0.

General vCPU creation changes:
- Set the initial GHCB gpa value as per the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 80675b3a 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Update ASID allocation to support SEV-ES guests

SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID
allocation routine to return an ASID in the respective range.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <d7aed505e31e3954268b2015bb60a1486269c780.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 85ca8be9 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Set the encryption mask for the SVM host save area

The SVM host save area is used to restore some host state on VMEXIT of an
SEV-ES guest. After allocating the save area, clear it and add the
encryption mask to the SVM host save area physical address that is
programmed into the VM_HSAVE_PA MSR.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 4444dfe4 14-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add NMI support for an SEV-ES guest

The GHCB specification defines how NMIs are to be handled for an SEV-ES
guest. To detect the completion of an NMI the hypervisor must not
intercept the IRET instruction (because a #VC while running the NMI will
issue an IRET) and, instead, must receive an NMI Complete exit event from
the guest.

Update the KVM support for detecting the completion of NMIs in the guest
to follow the GHCB specification. When an SEV-ES guest is active, the
IRET instruction will no longer be intercepted. Now, when the NMI Complete
exit event is received, the iret_interception() function will be called
to simulate the completion of the NMI.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <5ea3dd69b8d4396cefdc9048ebc1ab7caa70a847.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7ed9abfe 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Support string IO operations for an SEV-ES guest

For an SEV-ES guest, string-based port IO is performed to a shared
(un-encrypted) page so that both the hypervisor and guest can read or
write to it and each see the contents.

For string-based port IO operations, invoke SEV-ES specific routines that
can complete the operation using common KVM port IO support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8f423a80 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Support MMIO for an SEV-ES guest

For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page
so that both the hypervisor and guest can read or write to it and each
see the contents.

The GHCB specification provides software-defined VMGEXIT exit codes to
indicate a request for an MMIO read or an MMIO write. Add support to
recognize the MMIO requests and invoke SEV-ES specific routines that
can complete the MMIO operation. These routines use common KVM support
to complete the MMIO operation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 59e38b58 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Create trace events for VMGEXIT MSR protocol processing

Add trace events for entry to and exit from VMGEXIT MSR protocol
processing. The vCPU will be common for the trace events. The MSR
protocol processing is guided by the GHCB GPA in the VMCB, so the GHCB
GPA will represent the input and output values for the entry and exit
events, respectively. Additionally, the exit event will contain the
return code for the event.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c5b3b440c3e0db43ff2fc02813faa94fa54896b0.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d523ab6b 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Create trace events for VMGEXIT processing

Add trace events for entry to and exit from VMGEXIT processing. The vCPU
id and the exit reason will be common for the trace events. The exit info
fields will represent the input and output values for the entry and exit
events, respectively.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <25357dca49a38372e8f483753fb0c1c2a70a6898.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e1d71116 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x100 is a request for termination of the guest. The guest has
encountered some situation for which it has requested to be terminated.
The GHCB MSR value contains the reason for the request.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <f3a1f7850c75b6ea4101e15bbb4a3af1a203f1dc.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d3694667 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x004 is a request for CPUID information. Only a single CPUID
result register can be sent per invocation, so the protocol defines the
register that is requested. The GHCB MSR value is set to the CPUID
register value as per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <fd7ee347d3936e484c06e9001e340bf6387092cd.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 1edc1459 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002

The GHCB specification defines a GHCB MSR protocol using the lower
12-bits of the GHCB MSR (in the hypervisor this corresponds to the
GHCB GPA field in the VMCB).

Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as
per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c23c163a505290a0d1b9efc4659b838c8c902cbc.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 291bd20d 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add initial support for a VMGEXIT VMEXIT

SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a
VMGEXIT includes mapping the GHCB based on the guest GPA, which is
obtained from a new VMCB field, and then validating the required inputs
for the VMGEXIT exit reason.

Since many of the VMGEXIT exit reasons correspond to existing VMEXIT
reasons, the information from the GHCB is copied into the VMCB control
exit code areas and KVM register areas. The standard exit handlers are
invoked, similar to standard VMEXIT processing. Before restarting the
vCPU, the GHCB is updated with any registers that have been updated by
the hypervisor.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# add5e2f0 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for the SEV-ES VMSA

Allocate a page during vCPU creation to be used as the encrypted VM save
area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch
structure that indicates whether the guest state is protected.

When freeing a VMSA page that has been encrypted, the cache contents must
be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page.

[ i386 build warnings ]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <fde272b17eec804f3b9db18c131262fe074015c5.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 916391a2 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Add support for SEV-ES capability in KVM

Add support to KVM for determining if a system is capable of supporting
SEV-ES as well as determining if a guest is an SEV-ES guest.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e66792323982c822350e40c7a1cf67ea2978a70b.1607620209.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9d4747d0 10-Dec-2020 Tom Lendacky <thomas.lendacky@amd.com>

KVM: SVM: Remove the call to sev_platform_status() during setup

When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.

Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <618380488358b56af558f2682203786f09a49483.1607620209.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# dee734a7 30-Nov-2020 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: adjust SEV for commit 7e8e6eed75e

Since the ASID is now stored in svm->asid, pre_sev_run should also place
it there and not directly in the VMCB control area.

Reported-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 854c57f0 10-Nov-2020 Ashish Kalra <ashish.kalra@amd.com>

KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().

Fix offset computation in __sev_dbg_decrypt() to include the
source paddr before it is rounded down to be aligned to 16 bytes
as required by SEV API. This fixes incorrect guest memory dumps
observed when using qemu monitor.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 14e3dd8d 23-Sep-2020 Paolo Bonzini <pbonzini@redhat.com>

KVM: SEV: shorten comments around sev_clflush_pages

Very similar content is present in four comments in sev.c. Unfortunately
there are small differences that make it harder to place the comment
in sev_clflush_pages itself, but at least we can make it more concise.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 50085bee 07-Aug-2020 Cfir Cohen <cfir@google.com>

KVM: SVM: Mark SEV launch secret pages as dirty.

The LAUNCH_SECRET command performs encryption of the
launch secret memory contents. Mark pinned pages as
dirty, before unpinning them.
This matches the logic in sev_launch_update_data().

Signed-off-by: Cfir Cohen <cfir@google.com>
Message-Id: <20200808003746.66687-1-cfir@google.com>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e1ebb2b4 17-Sep-2020 Krish Sadhukhan <krish.sadhukhan@oracle.com>

KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains

In some hardware implementations, coherency between the encrypted and
unencrypted mappings of the same physical page in a VM is enforced. In
such a system, it is not required for software to flush the VM's page
from all CPU caches in the system prior to changing the value of the
C-bit for the page.

So check that bit before flushing the cache.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200917212038.5090-4-krish.sadhukhan@oracle.com


# 7be74942 25-Aug-2020 David Rientjes <rientjes@google.com>

KVM: SVM: Periodically schedule when unregistering regions on destroy

There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed. This can lead to soft lockups. For example, on a
host running 4.15:

watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
[<0>] release_pages+0x159/0x3d0
[<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
[<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
[<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
[<0>] kvm_arch_destroy_vm+0x47/0x200
[<0>] kvm_put_kvm+0x1a8/0x2f0
[<0>] kvm_vm_release+0x25/0x30
[<0>] do_exit+0x335/0xc10
[<0>] do_group_exit+0x3f/0xa0
[<0>] get_signal+0x1bc/0x670
[<0>] do_signal+0x31/0x130

Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.

Periodically schedule if necessary. This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.

Signed-off-by: David Rientjes <rientjes@google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ff2bd9ff 14-Jul-2020 Dan Carpenter <dan.carpenter@oracle.com>

KVM: SVM: Fix sev_pin_memory() error handling

The sev_pin_memory() function was modified to return error pointers
instead of NULL but there are two problems. The first problem is that
if "npages" is zero then it still returns NULL. Secondly, several of
the callers were not updated to check for error pointers instead of
NULL.

Either one of these issues will lead to an Oops.

Fixes: a8d908b5873c ("KVM: x86: report sev_pin_memory errors with PTR_ERR")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20200714142351.GA315374@mwanda>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 06e7852c 25-Jun-2020 Joerg Roedel <jroedel@suse.de>

KVM: SVM: Add vmcb_ prefix to mark_*() functions

Make it more clear what data structure these functions operate on.

No functional changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200625080325.28439-3-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8a14fe4f 03-Jun-2020 Jim Mattson <jmattson@google.com>

kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu

Both the vcpu_vmx structure and the vcpu_svm structure have a
'last_cpu' field. Move the common field into the kvm_vcpu_arch
structure. For clarity, rename it to 'last_vmentry_cpu.'

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-6-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 24263634 03-Jun-2020 Jim Mattson <jmattson@google.com>

kvm: svm: Always set svm->last_cpu on VMRUN

Previously, this field was only set when using SEV. Set it for all
vCPU configurations, so that it can be communicated to userspace for
diagnosing potential hardware errors.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-3-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a8d908b5 23-Jun-2020 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: report sev_pin_memory errors with PTR_ERR

Callers of sev_pin_memory() treat
NULL differently:

sev_launch_secret()/svm_register_enc_region() return -ENOMEM
sev_dbg_crypt() returns -EFAULT.

Switching to ERR_PTR() preserves the error and enables cleaner reporting of
different kinds of failures.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# dc42c8ae 26-May-2020 John Hubbard <jhubbard@nvidia.com>

KVM: SVM: convert get_user_pages() --> pin_user_pages()

This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Message-Id: <20200526062207.1360225-3-jhubbard@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 78824fab 26-May-2020 John Hubbard <jhubbard@nvidia.com>

KVM: SVM: fix svn_pin_memory()'s use of get_user_pages_fast()

There are two problems in svn_pin_memory():

1) The return value of get_user_pages_fast() is stored in an
unsigned long, although the declared return value is of type int.
This will not cause any symptoms, but it is misleading.
Fix this by changing the type of npinned to "int".

2) The number of pages passed into get_user_pages_fast() is stored
in an unsigned long, even though get_user_pages_fast() accepts an
int. This means that it is possible to silently overflow the number
of pages.

Fix this by adding a WARN_ON_ONCE() and an early error return. The
npages variable is left as an unsigned long for convenience in
checking for overflow.

Fixes: 89c505809052 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_UPDATE_DATA command")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Message-Id: <20200526062207.1360225-2-jhubbard@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 88dca4ca 01-Jun-2020 Christoph Hellwig <hch@lst.de>

mm: remove the pgprot argument to __vmalloc

The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 996ed22c 07-May-2020 Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>

arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()

When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().

Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory(). As the
original coding of this call was correct, revert the change made by that
commit.

Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b2bce0a5 11-Apr-2020 Borislav Petkov <bp@suse.de>

KVM: SVM: Fix build error due to missing release_pages() include

Fix:

arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’:
arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\
did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration]
360 | release_pages(pages, npinned);
| ^~~~~~~~~~~~~
| reclaim_pages

because svm.c includes pagemap.h but the carved out sev.c needs it too.
Triggered by a randconfig build.

Fixes: eaf78265a4ab ("KVM: SVM: Move SEV code to separate file")
Signed-off-by: Borislav Petkov <bp@suse.de>
Message-Id: <20200411160927.27954-1-bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9ef1530c 13-Apr-2020 Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: fix compilation with modular PSP and non-modular KVM

Use svm_sev_enabled() in order to cull all calls to PSP code. Otherwise,
compilation fails with undefined symbols if the PSP device driver is compiled
as a module and KVM is not.

Reported-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# eaf78265 24-Mar-2020 Joerg Roedel <jroedel@suse.de>

KVM: SVM: Move SEV code to separate file

Move the SEV specific parts of svm.c into the new sev.c file.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200324094154.32352-5-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>