History log of /linux-master/arch/x86/hyperv/hv_init.c
Revision Date Author Comments
# 0e3f7d12 20-Feb-2024 Nuno Das Neves <nunodasneves@linux.microsoft.com>

hyperv-tlfs: Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_*

The HV_REGISTER_ are used as arguments to hv_set/get_register(), which
delegate to arch-specific mechanisms for getting/setting synthetic
Hyper-V MSRs.

On arm64, HV_REGISTER_ defines are synthetic VP registers accessed via
the get/set vp registers hypercalls. The naming matches the TLFS
document, although these register names are not specific to arm64.

However, on x86 the prefix HV_REGISTER_ indicates Hyper-V MSRs accessed
via rdmsrl()/wrmsrl(). This is not consistent with the TLFS doc, where
HV_REGISTER_ is *only* used for used for VP register names used by
the get/set register hypercalls.

To fix this inconsistency and prevent future confusion, change the
arch-generic aliases used by callers of hv_set/get_register() to have
the prefix HV_MSR_ instead of HV_REGISTER_.

Use the prefix HV_X64_MSR_ for the x86-only Hyper-V MSRs. On x86, the
generic HV_MSR_'s point to the corresponding HV_X64_MSR_.

Move the arm64 HV_REGISTER_* defines to the asm-generic hyperv-tlfs.h,
since these are not specific to arm64. On arm64, the generic HV_MSR_'s
point to the corresponding HV_REGISTER_.

While at it, rename hv_get/set_registers() and related functions to
hv_get/set_msr(), hv_get/set_nested_msr(), etc. These are only used for
Hyper-V MSRs and this naming makes that clear.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1708440933-27125-1-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1708440933-27125-1-git-send-email-nunodasneves@linux.microsoft.com>


# 7e8037b0 11-Nov-2023 Saurabh Sengar <ssengar@linux.microsoft.com>

x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM

A Gen2 VM doesn't support legacy PCI/PCIe, so both raw_pci_ops and
raw_pci_ext_ops are NULL, and pci_subsys_init() -> pcibios_init()
doesn't call pcibios_resource_survey() -> e820__reserve_resources_late();
as a result, any emulated persistent memory of E820_TYPE_PRAM (12) via
the kernel parameter memmap=nn[KMG]!ss is not added into iomem_resource
and hence can't be detected by register_e820_pmem().

Fix this by directly calling e820__reserve_resources_late() in
hv_pci_init(), which is called from arch_initcall(pci_arch_init).

It's ok to move a Gen2 VM's e820__reserve_resources_late() from
subsys_initcall(pci_subsys_init) to arch_initcall(pci_arch_init) because
the code in-between doesn't depend on the E820 resources.
e820__reserve_resources_late() depends on e820__reserve_resources(),
which has been called earlier from setup_arch().

For a Gen-2 VM, the new hv_pci_init() also adds any memory of
E820_TYPE_PMEM (7) into iomem_resource, and acpi_nfit_register_region() ->
acpi_nfit_insert_resource() -> region_intersects() returns
REGION_INTERSECTS, so the memory of E820_TYPE_PMEM won't get added twice.

Changed the local variable "int gen2vm" to "bool gen2vm".

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1699691867-9827-1-git-send-email-ssengar@linux.microsoft.com>


# 203a521b 19-Sep-2023 Saurabh Sengar <ssengar@linux.microsoft.com>

x86/hyperv: Add common print prefix "Hyper-V" in hv_init

Add "#define pr_fmt()" in hv_init.c to use "Hyper-V:" as common
print prefix for all pr_*() statements in this file.

Remove the "Hyper-V:" already prefixed in couple of prints.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1695123361-8877-1-git-send-email-ssengar@linux.microsoft.com


# 14058f72 21-Sep-2023 Saurabh Sengar <ssengar@linux.microsoft.com>

x86/hyperv: Remove hv_vtl_early_init initcall

There has been cases reported where HYPERV_VTL_MODE is enabled by mistake,
on a non Hyper-V platforms. This causes the hv_vtl_early_init function to
be called in an non Hyper-V/VTL platforms which results the memory
corruption.

Remove the early_initcall for hv_vtl_early_init and call it at the end of
hyperv_init to make sure it is never called in a non Hyper-V platform by
mistake.

Reported-by: Mathias Krause <minipli@grsecurity.net>
Closes: https://lore.kernel.org/lkml/40467722-f4ab-19a5-4989-308225b1f9f0@grsecurity.net/
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1695358720-27681-1-git-send-email-ssengar@linux.microsoft.com


# f2a55d08 19-Sep-2023 Saurabh Sengar <ssengar@linux.microsoft.com>

x86/hyperv: Restrict get_vtl to only VTL platforms

When Linux runs in a non-default VTL (CONFIG_HYPERV_VTL_MODE=y),
get_vtl() must never fail as its return value is used in negotiations
with the host. In the more generic case, (CONFIG_HYPERV_VTL_MODE=n) the
VTL is always zero so there's no need to do the hypercall.

Make get_vtl() BUG() in case of failure and put the implementation under
"if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)" to avoid the call altogether in
the most generic use case.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1695182675-13405-1-git-send-email-ssengar@linux.microsoft.com


# e3131f1c 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Remove hv_isolation_type_en_snp

In ms_hyperv_init_platform(), do not distinguish between a SNP VM with
the paravisor and a SNP VM without the paravisor.

Replace hv_isolation_type_en_snp() with
!ms_hyperv.paravisor_present && hv_isolation_type_snp().

The hv_isolation_type_en_snp() in drivers/hv/hv.c and
drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since
we know !ms_hyperv.paravisor_present is true there.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-10-decui@microsoft.com


# b9b4fe3a 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor

When the paravisor is present, a SNP VM must use GHCB to access some
special MSRs, including HV_X64_MSR_GUEST_OS_ID and some SynIC MSRs.

Similarly, when the paravisor is present, a TDX VM must use TDX GHCI
to access the same MSRs.

Implement hv_tdx_msr_write() and hv_tdx_msr_read(), and use the helper
functions hv_ivm_msr_read() and hv_ivm_msr_write() to access the MSRs
in a unified way for SNP/TDX VMs with the paravisor.

Do not export hv_tdx_msr_write() and hv_tdx_msr_read(), because we never
really used hv_ghcb_msr_write() and hv_ghcb_msr_read() in any module.

Update arch/x86/include/asm/mshyperv.h so that the kernel can still build
if CONFIG_AMD_MEM_ENCRYPT or CONFIG_INTEL_TDX_GUEST is not set, or
neither is set.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-9-decui@microsoft.com


# 23378295 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor

The post_msg_page was removed in
commit 9a6b1a170ca8 ("Drivers: hv: vmbus: Remove the per-CPU post_msg_page")

However, it turns out that we need to bring it back, but only for a TDX VM
with the paravisor: in such a VM, the hyperv_pcpu_input_arg is not decrypted,
but the HVCALL_POST_MESSAGE in such a VM needs a decrypted page as the
hypercall input page: see the comments in hyperv_init() for a detailed
explanation.

Except for HVCALL_POST_MESSAGE and HVCALL_SIGNAL_EVENT, the other hypercalls
in a TDX VM with the paravisor still use hv_hypercall_pg and must use the
hyperv_pcpu_input_arg (which is encrypted in such a VM), when a hypercall
input page is used.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-8-decui@microsoft.com


# d3a9d7e4 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Introduce a global variable hyperv_paravisor_present

The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().

We introduce hyperv_paravisor_present because we can not use
ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:

struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
is included at the end of arch/x86/include/asm/mshyperv.h, but at the
beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
struct ms_hyperv_info in hv_do_hypercall().

We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
introduce a hypercall function structure for different VM types, and
at boot time, the right function pointers would be written into the
structure so that runtime testing of TDX vs. SNP vs. normal will be
avoided and hyperv_paravisor_present will no longer be needed.

Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.

Enhance hv_vtom_init() for a TDX VM with the paravisor.

In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com


# 68f2f2bc 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

Drivers: hv: vmbus: Support fully enlightened TDX guests

Add Hyper-V specific code so that a fully enlightened TDX guest (i.e.
without the paravisor) can run on Hyper-V:
Don't use hv_vp_assist_page. Use GHCI instead.
Don't try to use the unsupported HV_REGISTER_CRASH_CTL.
Don't trust (use) Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share the SynIC Event/Message pages with the hypervisor.
Don't use the Hyper-V TSC page for now, because non-trivial work is
required to share the page with the hypervisor.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-4-decui@microsoft.com


# d6e0228d 24-Aug-2023 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Support hypercalls for fully enlightened TDX guests

A fully enlightened TDX guest on Hyper-V (i.e. without the paravisor) only
uses the GHCI call rather than hv_hypercall_pg. Do not initialize
hypercall_pg for such a guest.

In hv_common_cpu_init(), the hyperv_pcpu_input_arg page needs to be
decrypted in such a guest.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230824080712.30327-3-decui@microsoft.com


# b1310355 18-Aug-2023 Tianyu Lan <tiala@microsoft.com>

x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest

hv vp assist page needs to be shared between SEV-SNP guest and Hyper-V.
So mark the page unencrypted in the SEV-SNP guest.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230818102919.1318039-4-ltykernel@gmail.com


# 8387ce06 18-Aug-2023 Tianyu Lan <tiala@microsoft.com>

x86/hyperv: Set Virtual Trust Level in VMBus init message

SEV-SNP guests on Hyper-V can run at multiple Virtual Trust
Levels (VTL). During boot, get the VTL at which we're running
using the GET_VP_REGISTERs hypercall, and save the value
for future use. Then during VMBus initialization, set the VTL
with the saved value as required in the VMBus init message.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20230818102919.1318039-3-ltykernel@gmail.com


# 670c04ad 09-Aug-2023 Dave Hansen <dave.hansen@linux.intel.com>

x86/apic: Nuke ack_APIC_irq()

Yet another wrapper of a wrapper gone along with the outdated comment
that this compiles to a single instruction.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)


# d5ace2a7 21-Jul-2023 Michael Kelley <mikelley@microsoft.com>

x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction

On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.

A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.

If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 9636be85 23-May-2023 Michael Kelley <mikelley@microsoft.com>

x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline

These commits

a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")

update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
because that memory will be correctly marked as decrypted or encrypted
for all VM types (CoCo or normal). But problems ensue when CPUs in the
VM go online or offline after virtual PCI devices have been configured.

When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
may call the virtual PCI driver and fault trying to use the as yet
uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
VM if the MMIO read and write hypercalls are used from state
CPUHP_AP_IRQ_AFFINITY_ONLINE.

When a CPU is taken offline, IRQs may be reassigned in state
CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
use the hyperv_pcpu_input_arg that has already been freed by a
higher state.

Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
early enough.

Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
a CPU goes offline. Retain the allocated memory, and reuse it if
the CPU comes back online later.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 6afd9dc1 26-Mar-2023 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Don't remap addresses that are above shared_gpa_boundary

With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.

While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.

While here, fix typos in two error messages.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 812b0597 26-Mar-2023 Michael Kelley <mikelley@microsoft.com>

x86/hyperv: Change vTOM handling to use standard coco mechanisms

Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.

vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM. Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.

Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.

A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.

When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.

To accomplish the switch in approach, the following must be done:

* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.

* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address

* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.

* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().

* Remove the Hyper-V special case from __set_memory_enc_dec()

* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.

* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses

[1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
[2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/

[ bp: Touchups. ]

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com


# 32c97d98 25-Nov-2022 Gaurav Kohli <gauravkohli@linux.microsoft.com>

x86/hyperv: Remove unregister syscore call from Hyper-V cleanup

Hyper-V cleanup code comes under panic path where preemption and irq
is already disabled. So calling of unregister_syscore_ops might schedule
out the thread even for the case where mutex lock is free.
hyperv_cleanup
unregister_syscore_ops
mutex_lock(&syscore_ops_lock)
might_sleep
Here might_sleep might schedule out this thread, where voluntary preemption
config is on and this thread will never comes back. And also this was added
earlier to maintain the symmetry which is not required as this can comes
during crash shutdown path only.

To prevent the same, removing unregister_syscore_ops function call.

Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1669443291-2575-1-git-send-email-gauravkohli@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 0408f16b 04-Nov-2022 Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com>

clocksource: hyper-v: Add TSC page support for root partition

Microsoft Hypervisor root partition has to map the TSC page specified
by the hypervisor, instead of providing the page to the hypervisor like
it's done in the guest partitions.

However, it's too early to map the page when the clock is initialized, so, the
actual mapping is happening later.

Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Wei Liu <wei.liu@kernel.org>
CC: Dexuan Cui <decui@microsoft.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: x86@kernel.org
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
CC: linux-hyperv@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Link: https://lore.kernel.org/r/166759443644.385891.15921594265843430260.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# ee681541 03-Nov-2022 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Restore VP assist page after cpu offlining/onlining

Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
'if (*hvp)' condition. This works for root partition as hv_cpu_die()
does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
for them. This causes VP assist page to remain unset after CPU
offline/online cycle:

$ rdmsr -p 24 0x40000073
10212f001
$ echo 0 > /sys/devices/system/cpu/cpu24/online
$ echo 1 > /sys/devices/system/cpu/cpu24/online
$ rdmsr -p 24 0x40000073
0

Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
NULL (and it's also NULL initially).

Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
present a (potential) issue for KVM. While Hyper-V uses
CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING
which comes earlier in CPU teardown sequence. It is theoretically
possible that Enlightened VMCS is still in use. It is unclear if the
issue is real and if using KVM with Hyper-V root partition is even
possible.

While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().

Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221103190601.399343-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 2982635a 27-Oct-2022 Anirudh Rayabharam <anrayabh@linux.microsoft.com>

x86/hyperv: fix invalid writes to MSRs during root partition kexec

hyperv_cleanup resets the hypercall page by setting the MSR to 0. However,
the root partition is not allowed to write to the GPA bits of the MSR.
Instead, it uses the hypercall page provided by the MSR. Similar is the
case with the reference TSC MSR.

Clear only the enable bit instead of zeroing the entire MSR to make
the code valid for root partition too.

Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221027095729.1676394-3-anrayabh@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 03b9a6e1 20-Oct-2022 Zhao Liu <zhao1.liu@intel.com>

x86/hyperv: Remove BUG_ON() for kmap_local_page()

The commit 154fb14df7a3c ("x86/hyperv: Replace kmap() with
kmap_local_page()") keeps the BUG_ON() to check if kmap_local_page()
fails.

But in fact, kmap_local_page() always returns a valid kernel address
and won't return NULL here. It will BUG on its own if it fails. [1]

So directly use memcpy_to_page() which creates local mapping to copy.

[1]: https://lore.kernel.org/lkml/YztFEyUA48et0yTt@iweiny-mobl/

Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20221020083820.2341088-1-zhao1.liu@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 154fb14d 28-Sep-2022 Zhao Liu <zhao1.liu@intel.com>

x86/hyperv: Replace kmap() with kmap_local_page()

kmap() is being deprecated in favor of kmap_local_page()[1].

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap's pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

In the fuction hyperv_init() of hyperv/hv_init.c, the mapping is used in a
single thread and is short live. So, in this case, it's safe to simply use
kmap_local_page() to create mapping, and this avoids the wasted cost of
kmap() for global synchronization.

In addtion, the fuction hyperv_init() checks if kmap() fails by BUG_ON().
From the original discussion[2], the BUG_ON() here is just used to
explicitly panic NULL pointer. So still keep the BUG_ON() in place to check
if kmap_local_page() fails. Based on this consideration, memcpy_to_page()
is not selected here but only kmap_local_page() is used.

Therefore, replace kmap() with kmap_local_page() in hyperv/hv_init.c.

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
[2]: https://lore.kernel.org/lkml/20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2/

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20220928095640.626350-1-zhao1.liu@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# d5ebde1e 28-Sep-2022 Li kunyu <kunyu@nfschina.com>

hyperv: simplify and rename generate_guest_id

The generate_guest_id function is more suitable for use after the
following modifications.

1. The return value of the function is modified to u64.
2. Remove the d_info1 and d_info2 parameters from the function, keep the
u64 type kernel_version parameter.
3. Rename the function to make it clearly a Hyper-V related function,
and modify it to hv_generate_guest_id.

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220928064046.3545-1-kunyu@nfschina.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 49d6a3c0 13-Jun-2022 Tianyu Lan <Tianyu.Lan@microsoft.com>

x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM

Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call()
to read/write MSR via GHCB page and depends on the sev code.
This may cause regression when sev code changes interface
design.

The latest SEV-ES code requires to negotiate GHCB version before
reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't
work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation
to decouple SEV and Hyper-V code. Negotiate GHCB version in the
hyperv_init() and use the version to communicate with Hyper-V
in the ghcb hv call function.

Fixes: 2ea29c5abbc2 ("x86/sev: Save the negotiated GHCB version")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# e1878402 27-Dec-2021 Michael Kelley <mikelley@microsoft.com>

x86/hyperv: Fix definition of hv_ghcb_pg variable

The percpu variable hv_ghcb_pg is incorrectly defined. The __percpu
qualifier should be associated with the union hv_ghcb * (i.e.,
a pointer), not with the target of the pointer. This distinction
makes no difference to gcc and the generated code, but sparse
correctly complains. Fix the definition in the interest of
general correctness in addition to making sparse happy.

No functional change.

Fixes: 0cc4f6d9f0b9 ("x86/hyperv: Initialize GHCB page in Isolation VM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1640662315-22260-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 062a5c42 13-Dec-2021 Tianyu Lan <Tianyu.Lan@microsoft.com>

hyper-v: Enable swiotlb bounce buffer for Isolation VM

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211213071407.314309-4-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# f3e613e7 04-Nov-2021 Sean Christopherson <seanjc@google.com>

x86/hyperv: Move required MSRs check to initial platform probing

Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration. Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.

At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions. At worst, the half baked setup
will crash/hang the kernel.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# daf97211 04-Nov-2021 Sean Christopherson <seanjc@google.com>

x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails

Check for a valid hv_vp_index array prior to derefencing hv_vp_index when
setting Hyper-V's TSC change callback. If Hyper-V setup failed in
hyperv_init(), the kernel will still report that it's running under
Hyper-V, but will have silently disabled nearly all functionality.

BUG: kernel NULL pointer dereference, address: 0000000000000010
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:set_hv_tscchange_cb+0x15/0xa0
Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08
...
Call Trace:
kvm_arch_init+0x17c/0x280
kvm_init+0x31/0x330
vmx_init+0xba/0x13a
do_one_initcall+0x41/0x1c0
kernel_init_freeable+0x1f2/0x23b
kernel_init+0x16/0x120
ret_from_fork+0x22/0x30

Fixes: 93286261de1b ("x86/hyperv: Reenlightenment notifications support")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 285f68af 12-Oct-2021 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted

The following issue is observed with CONFIG_DEBUG_PREEMPT when KVM loads:

KVM: vmx: using Hyper-V Enlightened VMCS
BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/488
caller is set_hv_tscchange_cb+0x16/0x80
CPU: 1 PID: 488 Comm: systemd-udevd Not tainted 5.15.0-rc5+ #396
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
Call Trace:
dump_stack_lvl+0x6a/0x9a
check_preemption_disabled+0xde/0xe0
? kvm_gen_update_masterclock+0xd0/0xd0 [kvm]
set_hv_tscchange_cb+0x16/0x80
kvm_arch_init+0x23f/0x290 [kvm]
kvm_init+0x30/0x310 [kvm]
vmx_init+0xaf/0x134 [kvm_intel]
...

set_hv_tscchange_cb() can get preempted in between acquiring
smp_processor_id() and writing to HV_X64_MSR_REENLIGHTENMENT_CONTROL. This
is not an issue by itself: HV_X64_MSR_REENLIGHTENMENT_CONTROL is a
partition-wide MSR and it doesn't matter which particular CPU will be
used to receive reenlightenment notifications. The only real problem can
(in theory) be observed if the CPU whose id was acquired with
smp_processor_id() goes offline before we manage to write to the MSR,
the logic in hv_cpu_die() won't be able to reassign it correctly.

Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211012155005.1613352-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# c5989b92 26-Oct-2021 Wan Jiabing <wanjiabing@vivo.com>

x86/hyperv: Remove duplicated include in hv_init

Fix following checkinclude.pl warning:
./arch/x86/hyperv/hv_init.c: linux/io.h is included more than once.

The include is in line 13. Remove the duplicated here.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Link: https://lore.kernel.org/r/20211026113249.30481-1-wanjiabing@vivo.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# faff4406 25-Oct-2021 Tianyu Lan <Tianyu.Lan@microsoft.com>

x86/hyperv: Add Write/Read MSR registers via ghcb page

Hyperv provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyperv requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB page.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 810a5212 25-Oct-2021 Tianyu Lan <Tianyu.Lan@microsoft.com>

x86/hyperv: Add new hvcall guest address host visibility support

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-4-ltykernel@gmail.com
[ wei: fix conflicts with tip ]
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 0cc4f6d9 25-Oct-2021 Tianyu Lan <Tianyu.Lan@microsoft.com>

x86/hyperv: Initialize GHCB page in Isolation VM

Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via ghcb page.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-2-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# e5d9b714 31-Jul-2021 Praveen Kumar <kumarpraveen@linux.microsoft.com>

x86/hyperv: fix root partition faults when writing to VP assist page MSR

For root partition the VP assist pages are pre-determined by the
hypervisor. The root kernel is not allowed to change them to
different locations. And thus, we are getting below stack as in
current implementation root is trying to perform write to specific
MSR.

[ 2.778197] unchecked MSR access error: WRMSR to 0x40000073 (tried to write 0x0000000145ac5001) at rIP: 0xffffffff810c1084 (native_write_msr+0x4/0x30)
[ 2.784867] Call Trace:
[ 2.791507] hv_cpu_init+0xf1/0x1c0
[ 2.798144] ? hyperv_report_panic+0xd0/0xd0
[ 2.804806] cpuhp_invoke_callback+0x11a/0x440
[ 2.811465] ? hv_resume+0x90/0x90
[ 2.818137] cpuhp_issue_call+0x126/0x130
[ 2.824782] __cpuhp_setup_state_cpuslocked+0x102/0x2b0
[ 2.831427] ? hyperv_report_panic+0xd0/0xd0
[ 2.838075] ? hyperv_report_panic+0xd0/0xd0
[ 2.844723] ? hv_resume+0x90/0x90
[ 2.851375] __cpuhp_setup_state+0x3d/0x90
[ 2.858030] hyperv_init+0x14e/0x410
[ 2.864689] ? enable_IR_x2apic+0x190/0x1a0
[ 2.871349] apic_intr_mode_init+0x8b/0x100
[ 2.878017] x86_late_time_init+0x20/0x30
[ 2.884675] start_kernel+0x459/0x4fb
[ 2.891329] secondary_startup_64_no_verify+0xb0/0xbb

Since the hypervisor already provides the VP assist pages for root
partition, we need to memremap the memory from hypervisor for root
kernel to use. The mapping is done in hv_cpu_init during bringup and is
unmapped in hv_cpu_die during teardown.

Signed-off-by: Praveen Kumar <kumarpraveen@linux.microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Link: https://lore.kernel.org/r/20210731120519.17154-1-kumarpraveen@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 6dc77fa5 14-Jul-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Move Hyper-V misc functionality to arch-neutral code

The check for whether hibernation is possible, and the enabling of
Hyper-V panic notification during kexec, are both architecture neutral.
Move the code from under arch/x86 and into drivers/hv/hv_common.c where
it can also be used for ARM64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626287687-2045-4-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 9d7cf2c9 14-Jul-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Add arch independent default functions for some Hyper-V handlers

Architecture independent Hyper-V code calls various arch-specific handlers
when needed. To aid in supporting multiple architectures, provide weak
defaults that can be overridden by arch-specific implementations where
appropriate. But when arch-specific overrides aren't needed or haven't
been implemented yet for a particular architecture, these stubs reduce
the amount of clutter under arch/.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626287687-2045-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# afca4d95 14-Jul-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Make portions of Hyper-V init code be arch neutral

The code to allocate and initialize the hv_vp_index array is
architecture neutral. Similarly, the code to allocate and
populate the hypercall input and output arg pages is architecture
neutral. Move both sets of code out from arch/x86 and into
utility functions in drivers/hv/hv_common.c that can be shared
by Hyper-V initialization on ARM64.

No functional changes. However, the allocation of the hypercall
input and output arg pages is done differently so that the
size is always the Hyper-V page size, even if not the same as
the guest page size (such as with ARM64's 64K page size).

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626287687-2045-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# a4d7e8ae 02-Jun-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Move Hyper-V extended capability check to arch neutral code

The extended capability query code is currently under arch/x86, but it
is architecture neutral, and is used by arch neutral code in the Hyper-V
balloon driver. Hence the balloon driver fails to build on other
architectures.

Fix by moving the ext cap code out from arch/x86. Because it is also
called from built-in architecture specific code, it can't be in a module,
so the Makefile treats as built-in even when CONFIG_HYPERV is "m". Also
drivers/Makefile is tweaked because this is the first occurrence of a
Hyper-V file that is built-in even when CONFIG_HYPERV is "m".

While here, update the hypercall status check to use the new helper
function instead of open coding. No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Link: https://lore.kernel.org/r/1622669804-2016-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# c4342633 12-May-2021 Ingo Molnar <mingo@kernel.org>

x86: Fix leftover comment typos

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 753ed9c9 16-Apr-2021 Joseph Salisbury <joseph.salisbury@microsoft.com>

drivers: hv: Create a consistent pattern for checking Hyper-V hypercall status

There is not a consistent pattern for checking Hyper-V hypercall status.
Existing code uses a number of variants. The variants work, but a consistent
pattern would improve the readability of the code, and be more conformant
to what the Hyper-V TLFS says about hypercall status.

Implemented new helper functions hv_result(), hv_result_success(), and
hv_repcomp(). Changed the places where hv_do_hypercall() and related variants
are used to use the helper functions.

Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1618620183-9967-2-git-send-email-joseph.salisbury@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 6dc2a774 23-Mar-2021 Sunil Muthuswamy <sunilmut@microsoft.com>

x86/Hyper-V: Support for free page reporting

Linux has support for free page reporting now (36e66c554b5c) for
virtualized environment. On Hyper-V when virtually backed VMs are
configured, Hyper-V will advertise cold memory discard capability,
when supported. This patch adds the support to hook into the free
page reporting infrastructure and leverage the Hyper-V cold memory
discard hint hypercall to report/free these pages back to the host.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Tested-by: Matheus Castello <matheus@castello.eng.br>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/SN4PR2101MB0880121FA4E2FEC67F35C1DCC0649@SN4PR2101MB0880.namprd21.prod.outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# d9f6e12f 18-Mar-2021 Ingo Molnar <mingo@kernel.org>

x86: Fix various typos in comments

Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org


# ec866be6 02-Mar-2021 Michael Kelley <mikelley@microsoft.com>

clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts

STIMER0 interrupts are most naturally modeled as per-cpu IRQs. But
because x86/x64 doesn't have per-cpu IRQs, the core STIMER0 interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used. Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.

A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception. Do this
by incorporating standard Linux per-cpu IRQ allocation into the main
SITMER0 driver code, and bypass it in the x86/x64 exception case. For
x86/x64, special case code is retained under arch/x86, but no STIMER0
interrupt handling code is needed under arch/arm64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-11-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# b548a774 02-Mar-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: vmbus: Move hyperv_report_panic_msg to arch neutral code

With the new Hyper-V MSR set function, hyperv_report_panic_msg() can be
architecture neutral, so move it out from under arch/x86 and merge into
hv_kmsg_dump(). This move also avoids needing a separate implementation
under arch/arm64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-5-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# f3c5e63c 02-Mar-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: Redo Hyper-V synthetic MSR get/set functions

Current code defines a separate get and set macro for each Hyper-V
synthetic MSR used by the VMbus driver. Furthermore, the get macro
can't be converted to a standard function because the second argument
is modified in place, which is somewhat bad form.

Redo this by providing a single get and a single set function that
take a parameter specifying the MSR to be operated on. Fixup usage
of the get function. Calling locations are no more complex than before,
but the code under arch/x86 and the upcoming code under arch/arm64
is significantly simplified.

Also standardize the names of Hyper-V synthetic MSRs that are
architecture neutral. But keep the old x86-specific names as aliases
that can be removed later when all references (particularly in KVM
code) have been cleaned up in a separate patch series.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-4-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# ca48739e 02-Mar-2021 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code

The Hyper-V page allocator functions are implemented in an architecture
neutral way. Move them into the architecture neutral VMbus module so
a separate implementation for ARM64 is not needed.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# e39397d1 03-Feb-2021 Wei Liu <wei.liu@kernel.org>

x86/hyperv: implement an MSI domain for root partition

When Linux runs as the root partition on Microsoft Hypervisor, its
interrupts are remapped. Linux will need to explicitly map and unmap
interrupts for hardware.

Implement an MSI domain to issue the correct hypercalls. And initialize
this irq domain as the default MSI irq domain.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210203150435.27941-16-wei.liu@kernel.org


# 80f73c9f 03-Feb-2021 Wei Liu <wei.liu@kernel.org>

x86/hyperv: handling hypercall page setup for root

When Linux is running as the root partition, the hypercall page will
have already been setup by Hyper-V. Copy the content over to the
allocated page.

Add checks to hv_suspend & co to bail early because they are not
supported in this setup yet.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210203150435.27941-8-wei.liu@kernel.org


# 99a0f46a 03-Feb-2021 Wei Liu <wei.liu@kernel.org>

x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

We will need the partition ID for executing some hypercalls later.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210203150435.27941-7-wei.liu@kernel.org


# 5d0f077e 03-Feb-2021 Wei Liu <wei.liu@kernel.org>

x86/hyperv: allocate output arg pages if required

When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.

Allocate pages for storing results when Linux runs as the root
partition.

Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210203150435.27941-6-wei.liu@kernel.org


# a6c76bb0 01-Feb-2021 Andrea Parri (Microsoft) <parri.andrea@gmail.com>

x86/hyperv: Load/save the Isolation Configuration leaf

If bit 22 of Group B Features is set, the guest has access to the
Isolation Configuration CPUID leaf. On x86, the first four bits
of EAX in this leaf provide the isolation type of the partition;
we entail three isolation types: 'SNP' (hardware-based isolation),
'VBS' (software-based isolation), and 'NONE' (no isolation).

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20210201144814.2701-2-parri.andrea@gmail.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# fff7b5e6 16-Jan-2021 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Initialize clockevents after LAPIC is initialized

With commit 4df4cb9e99f8, the Hyper-V direct-mode STIMER is actually
initialized before LAPIC is initialized: see

apic_intr_mode_init()

x86_platform.apic_post_init()
hyperv_init()
hv_stimer_alloc()

apic_bsp_setup()
setup_local_APIC()

setup_local_APIC() temporarily disables LAPIC, initializes it and
re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's
registered, it can be programmed immediately and the timer can fire
very soon:

hv_stimer_init
clockevents_config_and_register
clockevents_register_device
tick_check_new_device
tick_setup_device
tick_setup_periodic(), tick_setup_oneshot()
clockevents_program_event

When the timer fires in the hypervisor, if the LAPIC is in the
disabled state, new versions of Hyper-V ignore the event and don't inject
the timer interrupt into the VM, and hence the VM hangs when it boots.

Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
firmware, so the window of LAPIC being temporarily disabled is pretty
small, and the issue can only happen once out of 100~200 reboots for
a 40-vCPU VM on one dev host, and on another host the issue doesn't
reproduce after 2000 reboots.

The issue is more noticeable for kdump/kexec, because the LAPIC is
disabled by the first kernel, and stays disabled until the kdump/kexec
kernel enables it. This is especially an issue to a Generation-2 VM
(for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
(rather than CONFIG_HZ=250) is used.

Fix the issue by moving hv_stimer_alloc() to a later place where the
LAPIC timer is initialized.

Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# dfe94d40 21-Dec-2020 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Fix kexec panic/hang issues

Currently the kexec kernel can panic or hang due to 2 causes:

1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.

2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20201222065541.24312-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# e1471463 26-Sep-2020 Joseph Salisbury <joseph.salisbury@microsoft.com>

x86/hyperv: Remove aliases with X64 in their name

In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b
removed the "X64" in the symbol names so they would make sense for both x86 and
ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h
so that existing x86 code would continue to compile.

As a cleanup, update the x86 code to use the symbols without the "X64", then remove
the aliases. There's no functional change.

Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com>
Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# dfc53baa 26-Sep-2020 Joseph Salisbury <joseph.salisbury@microsoft.com>

x86/hyperv: Remove aliases with X64 in their name

In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b
removed the "X64" in the symbol names so they would make sense for both x86 and
ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h
so that existing x86 code would continue to compile.

As a cleanup, update the x86 code to use the symbols without the "X64", then remove
the aliases. There's no functional change.

Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com


# a3a66c38 03-Jul-2020 Christoph Hellwig <hch@lst.de>

vmalloc: fix the owner argument for the new __vmalloc_node_range callers

Fix the recently added new __vmalloc_node_range callers to pass the
correct values as the owner for display in /proc/vmallocinfo.

Fixes: 800e26b81311 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
Fixes: 10d5e97c1bf8 ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
Fixes: 7a0e27b2a0ce ("mm: remove vmalloc_exec")
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 800e26b8 25-Jun-2020 Christoph Hellwig <hch@lst.de>

x86/hyperv: allocate the hypercall page with only read and execute bits

Patch series "fix a hyperv W^X violation and remove vmalloc_exec"

Dexuan reported a W^X violation due to the fact that the hyper hypercall
page due switching it to be allocated using vmalloc_exec.

The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually
sets writable permissions in the pte. This series fixes the issue by
switching to the low-level __vmalloc_node_range interface that allows
specifing more detailed permissions instead. It then also open codes
the other two callers and removes the somewhat confusing vmalloc_exec
interface.

Peter noted that the hyper hypercall page allocation also has another
long standing issue in that it shouldn't use the full vmalloc but just
the module space. This issue is so far theoretical as the allocation is
done early in the boot process. I plan to fix it with another bigger
series for 5.9.

This patch (of 3):

Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes
the writable bit.

For this resurrect the removed PAGE_KERNEL_RX definition, but as
PAGE_KERNEL_ROX to match arm64 and powerpc.

Link: http://lkml.kernel.org/r/20200618064307.32739-2-hch@lst.de
Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a16be368 21-May-2020 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC

Convert various hypervisor vectors to IDTENTRY_SYSVEC:

- Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
- Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
- Remove the ASM idtentries in 64-bit
- Remove the BUILD_INTERRUPT entries in 32-bit
- Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.647997594@linutronix.de


# 88dca4ca 01-Jun-2020 Christoph Hellwig <hch@lst.de>

mm: remove the pgprot argument to __vmalloc

The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 78bb17f7 01-Jun-2020 Christoph Hellwig <hch@lst.de>

x86/hyperv: use vmalloc_exec for the hypercall page

Patch series "decruft the vmalloc API", v2.

Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.

I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic. This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules. Besides that it removes more than 300 lines of code.

This patch (of 29):

Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 38dce419 12-May-2020 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Properly suspend/resume reenlightenment notifications

Errors during hibernation with reenlightenment notifications enabled were
reported:

[ 51.730435] PM: hibernation entry
[ 51.737435] PM: Syncing filesystems ...
...
[ 54.102216] Disabling non-boot CPUs ...
[ 54.106633] smpboot: CPU 1 is now offline
[ 54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to
write 0x47c72780000100ee) at rIP: 0xffffffff90062f24
native_write_msr+0x4/0x20)
[ 54.110006] Call Trace:
[ 54.110006] hv_cpu_die+0xd9/0xf0
...

Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
other CPU when the CPU receiving them goes offline. Upon hibernation, there
is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
Disable the feature when cpumask_any_but() fails.

Also, as we now disable reenlightenment notifications upon hibernation we
need to restore them on resume. Check if hv_reenlightenment_cb was
previously set and restore from hv_resume().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20200512160153.134467-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 421f090c 20-Apr-2020 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Suspend/resume the VP assist page for hibernation

Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the
resume path, the "new" kernel's VP assist page is not suspended (i.e. not
disabled), and later when we jump to the "old" kernel, the page is not
properly re-enabled for CPU0 with the allocated page from the old kernel.

So far, the VP assist page is used by hv_apic_eoi_write(), and is also
used in the case of nested virtualization (running KVM atop Hyper-V).

For hv_apic_eoi_write(), when the page is not properly re-enabled,
hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written.
This is not ideal with respect to performance, but Hyper-V can still
correctly handle this according to the Hyper-V spec; nevertheless, Linux
still must update the Hyper-V hypervisor with the correct VP assist page
to prevent Hyper-V from writing to the stale page, which causes guest
memory corruption and consequently may have caused the hangs and triple
faults seen during non-boot CPUs resume.

Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
With the fix, hibernation can pass a long-haul test of 2000 runs.

In the case of nested virtualization, disabling/reenabling the assist
page upon hibernation may be unsafe if there are active L2 guests.
It looks KVM should be enhanced to abort the hibernation request if
there is any active L2 guest.

Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1587437171-2472-1-git-send-email-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# f3a99e76 06-Apr-2020 Tianyu Lan <Tianyu.Lan@microsoft.com>

x86/Hyper-V: Report crash data in die() when panic_on_oops is set

When oops happens with panic_on_oops unset, the oops
thread is killed by die() and system continues to run.
In such case, guest should not report crash register
data to host since system still runs. Check panic_on_oops
and return directly in hyperv_report_panic() when the function
is called in the die() and panic_on_oops is unset. Fix it.

Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>


# 05bd330a 06-Jan-2020 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Suspend/resume the hypercall page for hibernation

For hibernation the hypercall page must be disabled before the hibernation
image is created so that subsequent hypercall operations fail safely. On
resume the hypercall page has to be restored and reenabled to ensure proper
operation of the resumed kernel.

Implement the necessary suspend/resume callbacks.

[ tglx: Decrypted changelog ]

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1578350559-130275-1-git-send-email-decui@microsoft.com


# b96f8653 20-Nov-2019 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Implement hv_is_hibernation_supported()

The API will be used by the hv_balloon and hv_vmbus drivers.

Balloon up/down and hot-add of memory must not be active if the user
wants the Linux VM to support hibernation, because they are incompatible
with hibernation according to Hyper-V team, e.g. upon suspend the
balloon VSP doesn't save any info about the ballooned-out pages (if any);
so, after Linux resumes, Linux balloon VSC expects that the VSP will
return the pages if Linux is under memory pressure, but the VSP will
never do that, since the VSP thinks it never stole the pages from the VM.

So, if the user wants Linux VM to support hibernation, Linux must forbid
balloon up/down and hot-add, and the only functionality of the balloon VSC
driver is reporting the VM's memory pressure to the host.

Ideally, when Linux detects that the user wants it to support hibernation,
the balloon VSC should tell the VSP that it does not support ballooning
and hot-add. However, the current version of the VSP requires the VSC
should support these capabilities, otherwise the capability negotiation
fails and the VSC can not load at all, so with the later changes to the
VSC driver, Linux VM still reports to the VSP that the VSC supports these
capabilities, but the VSC ignores the VSP's requests of balloon up/down
and hot add, and reports an error to the VSP, when applicable. BTW, in
the future the balloon VSP driver will allow the VSC to not support the
capabilities of balloon up/down and hot add.

The ACPI S4 state is not a must for hibernation to work, because Linux is
able to hibernate as long as the system can shut down. However in practice
we decide to artificially use the presence of the virtual ACPI S4 state as
an indicator of the user's intent of using hibernation, because Linux VM
must find a way to know if the user wants to use the hibernation feature
or not.

By default, Hyper-V does not enable the virtual ACPI S4 state; on recent
Hyper-V hosts (e.g. RS5, 19H1), the administrator is able to enable the
state for a VM by WMI commands.

Once all the vmbus and VSC patches for the hibernation feature are
accepted, an extra patch will be submitted to forbid hibernation if the
virtual ACPI S4 state is absent, i.e. hv_is_hibernation_supported() is
false.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>


# fa36dcdf 30-Jul-2019 Himadri Pandya <himadrispandya@gmail.com>

x86: hv: Add function to allocate zeroed page for Hyper-V

Hyper-V assumes page size to be 4K. While this assumption holds true on
x86 architecture, it might not be true for ARM64 architecture. Hence
define hyper-v specific function to allocate a zeroed page which can
have a different implementation on ARM64 architecture to handle the
conflict between hyper-v's assumed page size and actual guest page size.

Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>


# 4df4cb9e9 12-Nov-2019 Michael Kelley <mikelley@microsoft.com>

x86/hyperv: Initialize clockevents earlier in CPU onlining

Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.

Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.

Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.

So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com


# bd00cd52 14-Aug-2019 Tianyu Lan <Tianyu.Lan@microsoft.com>

clocksource/drivers/hyperv: Add Hyper-V specific sched clock function

Hyper-V guests use the default native_sched_clock() in
pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the
raw TSC value, which can be discontinuous in a Hyper-V VM.

Add the generic hv_setup_sched_clock() to set the sched clock function
appropriately. On x86, this sets pv_ops.time.sched_clock to read the
Hyper-V reference TSC value that is scaled and adjusted to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.

[ tglx: Folded build fix ]

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20190814123216.32245-3-Tianyu.Lan@microsoft.com


# 8c3e44bd 12-Jul-2019 Maya Nakamura <m.maya.nakamura@gmail.com>

x86/hyperv: Add functions to allocate/deallocate page for Hyper-V

Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are not
used, they are ready to be used to allocate/deallocate memory on x86 when
their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1906272334560.32342@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/
Link: https://lkml.kernel.org/r/706b2e71eb3e587b5f8801e50f090fae2a00e35d.1562916939.git.m.maya.nakamura@gmail.com


# e320ab3c 18-Jul-2019 Dexuan Cui <decui@microsoft.com>

x86/hyper-v: Zero out the VP ASSIST PAGE on allocation

The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt:

"The hypervisor defines several special pages that "overlay" the guest's
Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are
not included in the normal GPA map maintained internally by the hypervisor.
Conceptually, they exist in a separate map that overlays the GPA map.

If a page within the GPA space is overlaid, any SPA page mapped to the
GPA page is effectively "obscured" and generally unreachable by the
virtual processor through processor memory accesses.

If an overlay page is disabled, the underlying GPA page is "uncovered",
and an existing mapping becomes accessible to the guest."

SPA = System Physical Address = the final real physical address.

When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST
PAGE and enables the EOI optimization for this CPU by writing the MSR
HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the
special SPA page, and this CPU *always* uses hvp->apic_assist (which is
shared with the hypervisor) to decide if it needs to write the EOI MSR.

When a CPU is offlined then on the outgoing CPU:
1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from
now on hvp->apic_assist belongs to the original "normal" SPA page;
2. the remaining work of stopping this CPU is done
3. this CPU is completely stopped.

Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule
IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write
the EOI MSR for every interrupt received, otherwise the hypervisor may not
deliver further interrupts, which may be needed to completely stop the CPU.

So, after the EOI optimization is disabled in hv_cpu_die(), it's required
that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the
current allocation mode because it lacks __GFP_ZERO. As a consequence the
bit might be set and interrupt handling would not write the EOI MSR causing
interrupt delivery to become stuck.

Add the missing __GFP_ZERO to the allocation.

Note 1: after the "normal" SPA page is allocted and zeroed out, neither the
hypervisor nor the guest writes into the page, so the page remains with
zeros.

Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI
optimization. When the optimization is enabled, the guest can still write
the EOI MSR register irrespective of the "No EOI required" value, but
that's slower than the optimized assist based variant.

Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM


# dd2cb348 30-Jun-2019 Michael Kelley <mikelley@microsoft.com>

clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic

Continue consolidating Hyper-V clock and timer code into an ISA
independent Hyper-V clocksource driver.

Move the existing clocksource code under drivers/hv and arch/x86 to the new
clocksource driver while separating out the ISA dependencies. Update
Hyper-V initialization to call initialization and cleanup routines since
the Hyper-V synthetic clock is not independently enumerated in ACPI.

Update Hyper-V clocksource users in KVM and VDSO to get definitions from
the new include file.

No behavior is changed and no new functionality is added.

Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "bp@alien8.de" <bp@alien8.de>
Cc: "will.deacon@arm.com" <will.deacon@arm.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "olaf@aepfle.de" <olaf@aepfle.de>
Cc: "apw@canonical.com" <apw@canonical.com>
Cc: "jasowang@redhat.com" <jasowang@redhat.com>
Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: "sashal@kernel.org" <sashal@kernel.org>
Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
Cc: "arnd@arndb.de" <arnd@arndb.de>
Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
Cc: "paul.burton@mips.com" <paul.burton@mips.com>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
Cc: "salyzyn@android.com" <salyzyn@android.com>
Cc: "pcc@google.com" <pcc@google.com>
Cc: "shuah@kernel.org" <shuah@kernel.org>
Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
Cc: "huw@codeweavers.com" <huw@codeweavers.com>
Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com


# 43aa3132 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 280

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose good title or non infringement see
the gnu general public license for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 9 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141900.459653302@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 534c89c2 13-Mar-2019 Kangjie Lu <kjlu@umn.edu>

x86/hyperv: Prevent potential NULL pointer dereference

The page allocation in hv_cpu_init() can fail, but the code does not
have a check for that.

Add a check and return -ENOMEM when the allocation fails.

[ tglx: Massaged changelog ]

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Acked-by: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: pakki001@umn.edu
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-hyperv@vger.kernel.org
Link: https://lkml.kernel.org/r/20190314054651.1315-1-kjlu@umn.edu


# 179fb36a 06-Mar-2019 Kairui Song <kasong@redhat.com>

x86/hyperv: Fix kernel panic when kexec on HyperV

After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"),
kexec fails with a kernel panic:

kexec_core: Starting new kernel
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
RIP: 0010:0xffffc9000001d000

Call Trace:
? __send_ipi_mask+0x1c6/0x2d0
? hv_send_ipi_mask_allbutself+0x6d/0xb0
? mp_save_irq+0x70/0x70
? __ioapic_read_entry+0x32/0x50
? ioapic_read_entry+0x39/0x50
? clear_IO_APIC_pin+0xb8/0x110
? native_stop_other_cpus+0x6e/0x170
? native_machine_shutdown+0x22/0x40
? kernel_kexec+0x136/0x156

That happens if hypercall based IPIs are used because the hypercall page is
reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
which invokes the hypercall and dereferences the unusable page.

To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
any misuse, IPI sending will fall back to the non hypercall based
method. This only happens on kexec / kdump so just setting the pointer to
NULL is good enough.

Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: devel@linuxdriverproject.org
Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.com


# c8ccf759 01-Mar-2019 Maya Nakamura <m.maya.nakamura@gmail.com>

PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset()

Remove the duplicate implementation of cpumask_to_vpset() and use the
shared implementation. Export hv_max_vp_index, which is required by
cpumask_to_vpset().

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>


# 2f285f46 18-Sep-2018 Dexuan Cui <decui@microsoft.com>

x86/hyperv: Suppress "PCI: Fatal: No config space access function found"

A Generation-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and
users always see the scary warning, which is actually harmless.

Suppress it.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
Cc: Olaf Aepfle <olaf@aepfle.de>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Link: https://lkml.kernel.org/r/ <KU1P153MB0166D977DC930996C4BF538ABF1D0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM


# 81b18bce 07-Jul-2018 Sunil Muthuswamy <sunilmut@microsoft.com>

Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic

In the VM mode on Hyper-V, currently, when the kernel panics, an error
code and few register values are populated in an MSR and the Hypervisor
notified. This information is collected on the host. The amount of
information currently collected is found to be limited and not very
actionable. To gather more actionable data, such as stack trace, the
proposal is to write one page worth of kmsg data on an allocated page
and the Hypervisor notified of the page address through the MSR.

- Sysctl option to control the behavior, with ON by default.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 1268ed0c 03-Jul-2018 K. Y. Srinivasan <kys@microsoft.com>

x86/hyper-v: Fix the circular dependency in IPI enlightenment

The IPI hypercalls depend on being able to map the Linux notion of CPU ID
to the hypervisor's notion of the CPU ID. The array hv_vp_index[] provides
this mapping. Code for populating this array depends on the IPI functionality.
Break this circular dependency.

[ tglx: Use a proper define instead of '-1' with a u32 variable as pointed
out by Vitaly ]

Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: hpa@zytor.com
Cc: sthemmin@microsoft.com
Cc: Michael.H.Kelley@microsoft.com
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180703230155.15160-1-kys@linuxonhyperv.com


# 7dc9b6b8 05-Jun-2018 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: vmbus: Make TLFS #define names architecture neutral

The Hyper-V feature and hint flags in hyperv-tlfs.h are all defined
with the string "X64" in the name. Some of these flags are indeed
x86/x64 specific, but others are not. For the ones that are used
in architecture independent Hyper-V driver code, or will be used in
the upcoming support for Hyper-V for ARM64, this patch removes the
"X64" from the name.

This patch changes the flags that are currently known to be
used on multiple architectures. Hyper-V for ARM64 is still a
work-in-progress and the Top Level Functional Spec (TLFS) has not
been separated into x86/x64 and ARM64 areas. So additional flags
may need to be updated later.

This patch only changes symbol names. There are no functional
changes.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 9a2d78e2 16-May-2018 K. Y. Srinivasan <kys@microsoft.com>

X86/Hyper-V: Consolidate the allocation of the hypercall input page

Consolidate the allocation of the hypercall input page.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-5-kys@linuxonhyperv.com


# 68bb7bfb 16-May-2018 K. Y. Srinivasan <kys@microsoft.com>

X86/Hyper-V: Enable IPI enlightenments

Hyper-V supports hypercalls to implement IPI; use them.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-2-kys@linuxonhyperv.com


# 6b48cb5f 16-May-2018 K. Y. Srinivasan <kys@microsoft.com>

X86/Hyper-V: Enlighten APIC access

Hyper-V supports MSR based APIC access; implement
the enlightenment.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: olaf@aepfle.de
Cc: sthemmin@microsoft.com
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: Michael.H.Kelley@microsoft.com
Cc: hpa@zytor.com
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: vkuznets@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-1-kys@linuxonhyperv.com


# a46d15cc 20-Mar-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyper-v: allocate and use Virtual Processor Assist Pages

Virtual Processor Assist Pages usage allows us to do optimized EOI
processing for APIC, enable Enlightened VMCS support in KVM and more.
struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>


# 5a485803 20-Mar-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyper-v: move hyperv.h out of uapi

hyperv.h is not part of uapi, there are no (known) users outside of kernel.
We are making changes to this file to match current Hyper-V Hypervisor
Top-Level Functional Specification (TLFS, see:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs)
and we don't want to maintain backwards compatibility.

Move the file renaming to hyperv-tlfs.h to avoid confusing it with
mshyperv.h. In future, all definitions from TLFS should go to it and
all kernel objects should go to mshyperv.h or include/linux/hyperv.h.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>


# 51d4e5da 24-Jan-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/irq: Count Hyper-V reenlightenment interrupts

Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
they are not interesting in general it's important when L2 nested guests
are running.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com


# e7c4e36c 24-Jan-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Redirect reenlightment notifications on CPU offlining

It is very unlikely for CPUs to get offlined when running on Hyper-V as
there is a protection in the vmbus module which prevents it when the guest
has any VMBus devices assigned. This, however, may change in future if an
option to reassign an already active channel will be added. It is also
possible to run without any Hyper-V devices or to have a CPU with no
assigned channels.

Reassign reenlightenment notifications to some other active CPU when the
CPU which is assigned to them goes offline.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-5-vkuznets@redhat.com


# 93286261 24-Jan-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Reenlightenment notifications support

Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when a VM is migrated to a host with
different TSC frequency for some short period the host emulates the
accesses to TSC and sends an interrupt to notify about the event. When the
guest is done updating everything it can disable TSC emulation and
everything will start working fast again.

These notifications weren't required until now as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux the TSC is even marked as
unstable on boot. Guests normally use 'tsc page' clocksource and host
updates its values on migrations automatically.

Things change when with nested virtualization: even when the PV
clocksources (kvm-clock or tsc page) are passed through to the nested
guests the TSC frequency and frequency changes need to be know..

Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com


# e2768eaa 24-Jan-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Add a function to read both TSC and TSC page value simulateneously

This is going to be used from KVM code where both TSC and TSC page value
are needed.

Nothing is supposed to use the function when Hyper-V code is compiled out,
just BUG().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com


# 89a8f6d4 24-Jan-2018 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Check for required priviliges in hyperv_init()

In hyperv_init() its presumed that it always has access to VP index and
hypercall MSRs while according to the specification it should be checked if
it's allowed to access the corresponding MSRs before accessing them.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com


# 4a5f3cde 22-Dec-2017 Michael Kelley <mikelley@microsoft.com>

Drivers: hv: vmbus: Remove x86-isms from arch independent drivers

hv_is_hypercall_page_setup() is used to check if Hyper-V is
initialized, but a 'hypercall page' is an x86 implementation detail
that isn't necessarily present on other architectures. Rename to the
architecture independent hv_is_hyperv_initialized() and add check
that x86_hyper is pointing to Hyper-V. Use this function instead of
direct references to x86-specific data structures in vmbus_drv.c,
and remove now redundant call in hv_init(). Also remove 'x86' from
the string name passed to cpuhp_setup_state().

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 03b2a320 09-Nov-2017 Juergen Gross <jgross@suse.com>

x86/virt: Add enum for hypervisors to replace x86_hyper

The x86_hyper pointer is only used for checking whether a virtual
device is supporting the hypervisor the system is running on.

Use an enum for that purpose instead and drop the x86_hyper pointer.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Xavier Deguillard <xdeguillard@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akataria@vmware.com
Cc: arnd@arndb.de
Cc: boris.ostrovsky@oracle.com
Cc: devel@linuxdriverproject.org
Cc: dmitry.torokhov@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: haiyangz@microsoft.com
Cc: kvm@vger.kernel.org
Cc: kys@microsoft.com
Cc: linux-graphics-maintainer@vmware.com
Cc: linux-input@vger.kernel.org
Cc: moltmann@vmware.com
Cc: pbonzini@redhat.com
Cc: pv-drivers@vmware.com
Cc: rkrcmar@redhat.com
Cc: sthemmin@microsoft.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20171109132739.23465-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7ed4325a 29-Oct-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv: vmbus: Make panic reporting to be more useful

Hyper-V allows the guest to report panic and the guest can pass additional
information. All this is logged on the host. Currently Linux is passing back
information that is not particularly useful. Make the following changes:

1. Windows uses crash MSR P0 to report bugcheck code. Follow the same
convention for Linux as well.
2. It will be useful to know the gust ID of the Linux guest that has
paniced. Pass back this information.

These changes will help in better supporting Linux on Hyper-V

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# a3b74243 06-Oct-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs

hv_flush_pcpu_ex structures are not cleared between calls for performance
reasons (they're variable size up to PAGE_SIZE each) but we must clear
hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The
rest of the structure is formed correctly.

To do the clearing in an efficient way stash the maximum possible vCPU
number (this may differ from Linux CPU id).

Reported-by: Jork Loeser <Jork.Loeser@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2ffd9e33 02-Aug-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyper-v: Use hypercall for remote TLB flush

Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
this is supposed to work faster than IPIs.

Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
we need to put the input somewhere in memory and we don't really want to
have memory allocation on each call so we pre-allocate per cpu memory areas
on boot.

pv_ops patching is happening very early so we need to separate
hyperv_setup_mmu_ops() and hyper_alloc_mmu().

It is possible and easy to implement local TLB flushing too and there is
even a hint for that. However, I don't see a room for optimization on the
host side as both hypercall and native tlb flush will result in vmexit. The
hint is also not set on modern Hyper-V versions.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-8-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7415aea6 02-Aug-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

hyper-v: Globalize vp_index

To support implementing remote TLB flushing on Hyper-V with a hypercall
we need to make vp_index available outside of vmbus module. Rename and
globalize.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-7-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fc53662f 02-Aug-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyper-v: Make hv_do_hypercall() inline

We have only three call sites for hv_do_hypercall() and we're going to
change HVCALL_SIGNAL_EVENT to doing fast hypercall so we can inline this
function for optimization.

Hyper-V top level functional specification states that r9-r11 registers
and flags may be clobbered by the hypervisor during hypercall and with
inlining this is somewhat important, add the clobbers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-3-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 67071816 04-Mar-2017 Stephen Hemminger <stephen@networkplumber.org>

hyperv: fix warning about missing prototype

Compiling with warnings enabled finds missing prototype for
hv_do_hypercall.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 90b20432 03-Mar-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-4-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 0733379b 03-Mar-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Move TSC reading method to asm/mshyperv.h

As a preparation to making Hyper-V TSC page suitable for vDSO move
the TSC page reading logic to asm/mshyperv.h. While on it, do the
following:

- Document the reading algorithm.
- Simplify the code a bit.
- Add explicit READ_ONCE() to not rely on 'volatile'.
- Add explicit barriers to prevent re-ordering (we need to read sequence
strictly before and after)
- Use mul_u64_u64_shr() instead of assembly, gcc generates a single 'mul'
instruction on x86_64 anyway.

[ tglx: Simplified the loop ]

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-3-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# bd2a9ada 03-Mar-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/hyperv: Implement hv_get_tsc_page()

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-2-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 73667e31 14-Feb-2017 Arnd Bergmann <arnd@arndb.de>

x86/hyperv: Hide unused label

This new 32-bit warning just showed up:

arch/x86/hyperv/hv_init.c: In function 'hyperv_init':
arch/x86/hyperv/hv_init.c:167:1: error: label 'register_msr_cs' defined but not used [-Werror=unused-label]

The easiest solution is to move the label up into the existing #ifdef that
has the goto.

Fixes: dee863b571b0 ("hv: export current Hyper-V clocksource")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Link: http://lkml.kernel.org/r/20170214211736.2641241-1-arnd@arndb.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 372b1e91 08-Feb-2017 K. Y. Srinivasan <kys@microsoft.com>

drivers: hv: Turn off write permission on the hypercall page

The hypercall page only needs to be executable but currently it is setup to
be writable as well. Fix the issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# dee863b5 04-Feb-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

hv: export current Hyper-V clocksource

As a preparation to implementing Hyper-V PTP device supporting
.getcrosststamp we need to export a reference to the current Hyper-V
clocksource in use (MSR or TSC page).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 5647dbf8 28-Jan-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

Drivers: hv: restore TSC page cleanup before kexec

We need to cleanup the TSC page before doing kexec/kdump or the new kernel
may crash if it tries to use it.

Fixes: 63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d6f3609d 28-Jan-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

Drivers: hv: restore hypervcall page cleanup before kexec

We need to cleanup the hypercall page before doing kexec/kdump or the new
kernel may crash if it tries to use it. Reuse the now-empty hv_cleanup
function renaming it to hyperv_cleanup and moving to the arch specific
code.

Fixes: 8730046c1498 ("Drivers: hv vmbus: Move Hypercall page setup out of common code")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 73638cdd 19-Jan-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv: vmbus: Move the check for hypercall page setup

As part of the effort to separate out architecture specific code, move the
check for detecting if the hypercall page is setup.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d058fa7e 19-Jan-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv: vmbus: Move the crash notification function

As part of the effort to separate out architecture specific code, move the
crash notification function.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 63ed4e0c 19-Jan-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code

As part of the effort to separate out architecture specific code,
consolidate all Hyper-V specific clocksource code to an architecture
specific code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6ab42a66 18-Jan-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv: vmbus: Move Hypercall invocation code out of common code

As part of the effort to separate out architecture specific code, move the
hypercall invocation code to an architecture specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 8730046c 18-Jan-2017 K. Y. Srinivasan <kys@microsoft.com>

Drivers: hv vmbus: Move Hypercall page setup out of common code

As part of the effort to separate out architecture specific code, move the
hypercall page setup to an architecture specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>