#
284851ee |
|
16-Feb-2024 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: Get rid of return value from kvm_arch_create_vm_debugfs() The general expectation with debugfs is that any initialization failure is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows implementations to return an error and kvm_create_vm_debugfs() allows that to fail VM creation. Change to a void return to discourage architectures from making debugfs failures fatal for the VM. Seems like everyone already had the right idea, as all implementations already return 0 unconditionally. Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20240216155941.2029458-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
c5b31cc2 |
|
17-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: remove CONFIG_HAVE_KVM_IRQFD All platforms with a kernel irqchip have support for irqfd. Unify the two configuration items so that userspace can expect to use irqfd to inject interrupts into the irqchip. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
63912245 |
|
15-Mar-2023 |
Wei Wang <wei.w.wang@intel.com> |
KVM: move KVM_CAP_DEVICE_CTRL to the generic check KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move KVM_CAP_DEVICE_CTRL to the generic check for the two reasons: 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO). For example, userspace VFIO implementation may needs to create KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it checked at the generic code than at each arch's code. 2) KVM_CREATE_DEVICE has been added to the generic code. Link: https://lore.kernel.org/all/20221215115207.14784-1-wei.w.wang@intel.com Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> (riscv) Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230315101606.10636-1-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
f128cf8c |
|
27-Oct-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4a2e993f |
|
27-Oct-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Return '1' unconditionally for KVM_CAP_SYNC_MMU Advertise that KVM's MMU is synchronized with the primary MMU for all flavors of PPC KVM support, i.e. advertise that the MMU is synchronized when CONFIG_KVM_BOOK3S_HV_POSSIBLE=y but the VM is not using hypervisor mode (a.k.a. PR VMs). PR VMs, via kvm_unmap_gfn_range_pr(), do the right thing for mmu_notifier invalidation events, and more tellingly, KVM returns '1' for KVM_CAP_SYNC_MMU when CONFIG_KVM_BOOK3S_HV_POSSIBLE=n and CONFIG_KVM_BOOK3S_PR_POSSIBLE=y, i.e. KVM already advertises a synchronized MMU for PR VMs, just not when CONFIG_KVM_BOOK3S_HV_POSSIBLE=y. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20231027182217.3615211-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1853d750 |
|
27-Oct-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER Assert that both KVM_ARCH_WANT_MMU_NOTIFIER and CONFIG_MMU_NOTIFIER are defined when KVM is enabled, and return '1' unconditionally for the CONFIG_KVM_BOOK3S_HV_POSSIBLE=n path. All flavors of PPC support for KVM select MMU_NOTIFIER, and KVM_ARCH_WANT_MMU_NOTIFIER is unconditionally defined by arch/powerpc/include/asm/kvm_host.h. Effectively dropping use of KVM_ARCH_WANT_MMU_NOTIFIER will simplify a future cleanup to turn KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig, i.e. will allow combining all of the #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) checks into a single #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER without having to worry about PPC's "bare" usage of KVM_ARCH_WANT_MMU_NOTIFIER. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7028ac8d |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Use accessors for VCPU registers Introduce accessor generator macros for VCPU registers. Use the accessor functions to replace direct accesses to this registers. This will be important later for Nested APIv2 support which requires additional functionality for accessing and modifying VCPU state. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-5-jniethe5@gmail.com
|
#
52425a3b |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Introduce FPR/VR accessor functions Introduce accessor functions for floating point and vector registers like the ones that exist for GPRs. Use these to replace the existing FPR and VR accessor macros. This will be important later for Nested APIv2 support which requires additional functionality for accessing and modifying VCPU state. Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-3-jniethe5@gmail.com
|
#
d8708b80 |
|
08-Feb-2023 |
Thomas Huth <thuth@redhat.com> |
KVM: Change return type of kvm_arch_vm_ioctl() to "int" All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
acf17878 |
|
07-Mar-2023 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Make kvmppc_get_last_inst() produce a ppc_inst_t This changes kvmppc_get_last_inst() so that the instruction it fetches is returned in a ppc_inst_t variable rather than a u32. This will allow us to return a 64-bit prefixed instruction on those 64-bit machines that implement Power ISA v3.1 or later, such as POWER10. On 32-bit platforms, ppc_inst_t is 32 bits wide, and is turned back into a u32 by ppc_inst_val, which is an identity operation on those platforms. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgsiPlL9O7KnlZZ@cleo
|
#
6cd5c1db |
|
30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Set SRR1[PREFIX] bit on injected interrupts Pass the hypervisor (H)SRR1[PREFIX] indication through to synchronous interrupts injected into the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-3-npiggin@gmail.com
|
#
460ba21d |
|
30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Permit SRR1 flags in more injected interrupt types The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1 for many types of synchronous interrupts which is set when the interrupt is caused by a prefixed instruction. This requires KVM to be able to set this bit when injecting interrupts into a guest. Plumb through the SRR1 "flags" argument to the core_queue APIs where it's missing for this. For now they are set to 0, which is no change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup kvmppc_core_queue_alignment() in booke.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-2-npiggin@gmail.com
|
#
52882b9c |
|
04-May-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent When introduced, IRQFD resampling worked on POWER8 with XICS. However KVM on POWER9 has never implemented it - the compatibility mode code ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native XIVE mode does not handle INTx in KVM at all. This moved the capability support advertising to platforms and stops advertising it on XIVE, i.e. POWER9 and later. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
441f7bfa |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Opt out of generic hardware enabling on s390 and PPC Allow architectures to opt out of the generic hardware enabling logic, and opt out on both s390 and PPC, which don't need to manually enable virtualization as it's always on (when available). In addition to letting s390 and PPC drop a bit of dead code, this will hopefully also allow ARM to clean up its related code, e.g. ARM has its own per-CPU flag to track which CPUs have enable hardware due to the need to keep hardware enabled indefinitely when pKVM is enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
81a1cf9f |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_check_processor_compat() hook Drop kvm_arch_check_processor_compat() and its support code now that all architecture implementations are nops. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20221130230934.1014142-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a578a0a9 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_{init,exit}() hooks Drop kvm_arch_init() and kvm_arch_exit() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ae19b15d |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Move processor compatibility check to module init Move KVM PPC's compatibility checks to their respective module_init() hooks, there's no need to wait until KVM's common compat check, nor is there a need to perform the check on every CPU (provided by common KVM's hook), as the compatibility checks operate on global data. arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec; arch/powerpc/kvm/book3s.c: return 0 arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2") arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc") strcmp(cur_cpu_spec->cpu_name, "e5500") strcmp(cur_cpu_spec->cpu_name, "e6500") Cc: Fabiano Rosas <farosas@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Message-Id: <20221130230934.1014142-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
63a1bd8a |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop arch hardware (un)setup hooks Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d663b8a2 |
|
03-Nov-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: replace direct irq.h inclusion virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source directory (i.e. not from arch/*/include) for the sole purpose of retrieving irqchip_in_kernel. Making the function inline in a header that is already included, such as asm/kvm_host.h, is not possible because it needs to look at struct kvm which is defined after asm/kvm_host.h is included. So add a kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is only performance critical on arm64 and x86, and the non-inline function is enough on all other architectures. irq.h can then be deleted from all architectures except x86. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0a5bfb82 |
|
16-Aug-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Fix decrementer migration We used to have a workaround[1] for a hang during migration that was made ineffective when we converted the decrementer expiry to be relative to guest timebase. The point of the workaround was that in the absence of an explicit decrementer expiry value provided by userspace during migration, KVM needs to initialize dec_expires to a value that will result in an expired decrementer after subtracting the current guest timebase. That stops the vcpu from hanging after migration due to a decrementer that's too large. If the dec_expires is now relative to guest timebase, its initialization needs to be guest timebase-relative as well, otherwise we end up with a decrementer expiry that is still larger than the guest timebase. 1- https://git.kernel.org/torvalds/c/5855564c8ab2 Fixes: 3c1a4322bba7 ("KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220816222517.1916391-1-farosas@linux.ibm.com
|
#
c59fb127 |
|
20-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: remove KVM_REQ_UNHALT KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
113fe88e |
|
11-Jun-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Don't include asm/setup.h in asm/machdep.h asm/machdep.h doesn't need asm/setup.h Remove it. Add it directly in files that needs it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3b1dfb19a2c3265fb4abc2bfc7b6eae9261a998b.1654966508.git.christophe.leroy@csgroup.eu
|
#
6ba2a292 |
|
23-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator This removes the fixed-size lpid_inuse array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
|
#
18827eee |
|
23-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Remove kvmppc_claim_lpid Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to change the underlying implementation in a future patch. The host LPID is always 0, so that can be a detail of the allocator. If the allocator range is restricted, that can reserve LPIDs at the top of the range. This allows kvmppc_claim_lpid to be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
|
#
e6f6390a |
|
08-Mar-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Add missing headers Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
|
#
2031f287 |
|
14-Apr-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused Add wrappers to acquire/release KVM's SRCU lock when stashing the index in vcpu->src_idx, along with rudimentary detection of illegal usage, e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the SRCU index is (currently) either 0 or 1, illegal nesting bugs can go unnoticed for quite some time and only cause problems when the nested lock happens to get a different index. Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will likely yell so loudly that it will bring the kernel to its knees. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20220415004343.2203171-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f771b557 |
|
06-Mar-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Use KVM_CAP_PPC_AIL_MODE_3 Use KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL resource mode to 3 with the H_SET_MODE hypercall. This capability differs between processor types and KVM types (PR, HV, Nested HV), and affects guest-visible behaviour. QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and use the KVM CAP if available to determine KVM support[2]. [1] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00437.html [2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00439.html Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> [mpe: Rebase onto 93b71801a827 from kvm-ppc-cap-210 branch, add EXPORT_SYMBOL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220222064727.2314380-4-npiggin@gmail.com
|
#
4feb74aa |
|
24-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Decrement module refcount if init_vm fails We increment the reference count for KVM-HV/PR before the call to kvmppc_core_init_vm. If that function fails we need to decrement the refcount. Also remove the check on kvm_ops->owner because try_module_get can handle a NULL module. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125155735.1018683-5-farosas@linux.ibm.com
|
#
faf01aef |
|
10-Jan-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Merge powerpc's debugfs entry content into generic entry At the moment KVM on PPC creates 4 types of entries under the kvm debugfs: 1) "%pid-%fd" per a KVM instance (for all platforms); 2) "vm%pid" (for PPC Book3s HV KVM); 3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM); 4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS); The problem with this is that multiple VMs per process is not allowed for 2) and 3) which makes it possible for userspace to trigger errors when creating duplicated debugfs entries. This merges all these into 1). This defines kvm_arch_create_kvm_debugfs() similar to kvm_arch_create_vcpu_debugfs(). This defines 2 hooks in kvmppc_ops that allow specific KVM implementations add necessary entries, this adds the _e500 suffix to kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for. This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC. This removes no more used debugfs_dir pointers from PPC kvm_arch structs. This stops removing vcpu entries as once created vcpus stay around for the entire life of a VM and removed when the KVM instance is closed, see commit d56f5136b010 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories"). Suggested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220111005404.162219-1-aik@ozlabs.ru
|
#
c1c8a663 |
|
25-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure MMIO emulation can fail if the guest uses an instruction that we are not prepared to emulate. Since these instructions can be and most likely are valid ones, this is (slightly) closer to an access fault than to an illegal instruction, so deliver a Data Storage interrupt instead of a Program interrupt. BookE ignores bad faults, so it will keep using a Program interrupt because a DSI would cause a fault loop in the guest. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-6-farosas@linux.ibm.com
|
#
349fbfe9 |
|
25-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: mmio: Return to guest after emulation failure If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit d69614a295ae ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Also change the message to a ratelimited one since we're letting the guest run and it could flood the host logs. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-5-farosas@linux.ibm.com
|
#
3f831504 |
|
25-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: mmio: Reject instructions that access more than mmio.data size The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. (While here, remove the "bad MMIO" messages. The caller already has an error message.) Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-4-farosas@linux.ibm.com
|
#
b99234b9 |
|
25-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Fix vmx/vsx mixup in mmio emulation The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-3-farosas@linux.ibm.com
|
#
36d014d3 |
|
25-Jan-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Stop returning internal values to userspace Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-2-farosas@linux.ibm.com
|
#
91b99ea7 |
|
08-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt() Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
510958e9 |
|
08-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Force PPC to define its own rcuwait object Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and instead force the architecture (PPC) to define its own rcuwait object. Allowing common KVM to directly access vcpu->wait without a guard makes it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(), kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for PPC. Due to PPC's shenanigans with respect to callbacks and waits (it switches to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether or not this fixes any bugs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6a99c6e3 |
|
06-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now that its use has been removed in all architectures. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
|
#
eaaaed13 |
|
06-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Avoid referencing userspace memory region in memslot updates For PPC HV, get the number of pages directly from the new memslot instead of computing the same from the userspace memory region, and explicitly check for !DELETE instead of inferring the same when toggling mmio_update. The motivation for these changes is to avoid referencing the @mem param so that it can be dropped in a future commit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
|
#
537a17b3 |
|
06-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Let/force architectures to deal with arch specific memslot data Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch code to handle propagating arch specific data from "new" to "old" when necessary. This is a baby step towards dynamically allocating "new" from the get go, and is a (very) minor performance boost on x86 due to not unnecessarily copying arch data. For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE and FLAGS_ONLY. This is functionally a nop as the previous behavior would overwrite the pointer for CREATE, and eventually discard/ignore it for DELETE. For x86, copy the arch data only for FLAGS_ONLY changes. Unlike PPC HV, x86 needs to reallocate arch data in the MOVE case as the size of x86's allocations depend on the alignment of the memslot's gfn. Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to match the "commit" prototype. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [mss: add missing RISCV kvm_arch_prepare_memory_region() change] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
|
#
27592ae8 |
|
16-Nov-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: Move wiping of the kvm->vcpus array to common code All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b7915d55 |
|
16-Nov-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS It doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211116163443.88707-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2a24d80f |
|
14-Sep-2021 |
Nick Desaulniers <ndesaulniers@google.com> |
powerpc/asm: Remove UPD_CONSTR after GCC 4.9 removal UPD_CONSTR was previously a preprocessor define for an old GCC 4.9 inline asm bug with m<> constraints. Fixes: 6563139d90ad ("powerpc: remove GCC version check for UPD_CONSTR") Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210914161712.2463458-1-ndesaulniers@google.com
|
#
a1c42dde |
|
13-Sep-2021 |
Juergen Gross <jgross@suse.com> |
kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the number of allowed vcpu-ids. This has already led to confusion, so rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more clear Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210913135745.13944-3-jgross@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bc4188a2 |
|
15-Jul-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak vcpu_put is not called if the user copy fails. This can result in preempt notifier corruption and crashes, among other issues. Fixes: b3cebfe8c1ca ("KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210716024310.164448-2-npiggin@gmail.com
|
#
b87cc116 |
|
21-Jun-2021 |
Bharata B Rao <bharata@linux.ibm.com> |
KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability Now that we have H_RPT_INVALIDATE fully implemented, enable support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com
|
#
9236f57a |
|
04-Jan-2021 |
Cédric Le Goater <clg@kaod.org> |
KVM: PPC: Make the VMX instruction emulation routines static These are only used locally. It fixes these W=1 compile errors : ../arch/powerpc/kvm/powerpc.c:1521:5: error: no previous prototype for ‘kvmppc_get_vmx_dword’ [-Werror=missing-prototypes] 1521 | int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val) | ^~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1539:5: error: no previous prototype for ‘kvmppc_get_vmx_word’ [-Werror=missing-prototypes] 1539 | int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val) | ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1557:5: error: no previous prototype for ‘kvmppc_get_vmx_hword’ [-Werror=missing-prototypes] 1557 | int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val) | ^~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1575:5: error: no previous prototype for ‘kvmppc_get_vmx_byte’ [-Werror=missing-prototypes] 1575 | int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val) | ^~~~~~~~~~~~~~~~~~~ Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-19-clg@kaod.org
|
#
a722076e |
|
05-Feb-2021 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [paulus@ozlabs.org - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
d9a47eda |
|
16-Dec-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
ff57698a |
|
22-Oct-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Fix update form addressing in inline assembly In several places, inline assembly uses the "%Un" modifier to enable the use of instruction with update form addressing, but the associated "<>" constraint is missing. As mentioned in previous patch, this fails with gcc 4.9, so "<>" can't be used directly. Use UPD_CONSTR macro everywhere %Un modifier is used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62eab5ca595485c192de1765bdac099f633a21d0.1603358942.git.christophe.leroy@csgroup.eu
|
#
1508c22f |
|
08-Jun-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Protect kvm_vcpu_read_guest with srcu locks The kvm_vcpu_read_guest/kvm_vcpu_write_guest used for nested guests eventually call srcu_dereference_check to dereference a memslot and lockdep produces a warning as neither kvm->slots_lock nor kvm->srcu lock is held and kvm->users_count is above zero (>100 in fact). This wraps mentioned VCPU read/write helpers in srcu read lock/unlock as it is done in other places. This uses vcpu->srcu_idx when possible. These helpers are only used for nested KVM so this may explain why we did not see these before. Here is an example of a warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Not tainted ----------------------------- include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-ppc/2752: #0: c000200359016be0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x144/0xd80 [kvm] stack backtrace: CPU: 80 PID: 2752 Comm: qemu-system-ppc Not tainted 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Call Trace: [c0002003591ab240] [c000000000b23ab4] dump_stack+0x190/0x25c (unreliable) [c0002003591ab2b0] [c00000000023f954] lockdep_rcu_suspicious+0x140/0x164 [c0002003591ab330] [c008000004a445f8] kvm_vcpu_gfn_to_memslot+0x4c0/0x510 [kvm] [c0002003591ab3a0] [c008000004a44c18] kvm_vcpu_read_guest+0xa0/0x180 [kvm] [c0002003591ab410] [c008000004ff9bd8] kvmhv_enter_nested_guest+0x90/0xb80 [kvm_hv] [c0002003591ab980] [c008000004fe07bc] kvmppc_pseries_do_hcall+0x7b4/0x1c30 [kvm_hv] [c0002003591aba10] [c008000004fe5d30] kvmppc_vcpu_run_hv+0x10a8/0x1a30 [kvm_hv] [c0002003591abae0] [c008000004a5d954] kvmppc_vcpu_run+0x4c/0x70 [kvm] [c0002003591abb10] [c008000004a56e54] kvm_arch_vcpu_ioctl_run+0x56c/0x7c0 [kvm] [c0002003591abba0] [c008000004a3ddc4] kvm_vcpu_ioctl+0x4ac/0xd80 [kvm] [c0002003591abd20] [c0000000006ebb58] ksys_ioctl+0x188/0x210 [c0002003591abd70] [c0000000006ebc28] sys_ioctl+0x48/0xb0 [c0002003591abdb0] [c000000000042764] system_call_exception+0x1d4/0x2e0 [c0002003591abe20] [c00000000000cce8] system_call_common+0xe8/0x214 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
3f649ab7 |
|
03-Jun-2020 |
Kees Cook <keescook@chromium.org> |
treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
8c99d345 |
|
26-Apr-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
KVM: PPC: Clean up redundant 'kvm_run' parameters In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
da4ad88c |
|
23-Apr-2020 |
Davidlohr Bueso <dave@stgolabs.net> |
kvm: Replace vcpu->swait with rcuwait The use of any sort of waitqueue (simple or regular) for wait/waking vcpus has always been an overkill and semantically wrong. Because this is per-vcpu (which is blocked) there is only ever a single waiting vcpu, thus no need for any sort of queue. As such, make use of the rcuwait primitive, with the following considerations: - rcuwait already provides the proper barriers that serialize concurrent waiter and waker. - Task wakeup is done in rcu read critical region, with a stable task pointer. - Because there is no concurrency among waiters, we need not worry about rcuwait_wait_event() calls corrupting the wait->task. As a consequence, this saves the locking done in swait when modifying the queue. This also applies to per-vcore wait for powerpc kvm-hv. The x86 tscdeadline_latency test mentioned in 8577370fb0cb ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg, latency is reduced by around 15-20% with this change. Cc: Paul Mackerras <paulus@ozlabs.org> Cc: kvmarm@lists.cs.columbia.edu Cc: linux-mips@vger.kernel.org Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Message-Id: <20200424054837.5138-6-dave@stgolabs.net> [Avoid extra logic changes. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b9b2782c |
|
05-May-2020 |
Peter Xu <peterx@redhat.com> |
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
495907ec |
|
05-May-2020 |
Peter Xu <peterx@redhat.com> |
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1b94f6f8 |
|
15-Apr-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_run In earlier versions of kvm, 'kvm_run' was an independent structure and was not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplifies the function definition, removes the extra 'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Message-Id: <20200416051057.26526-1-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b9904085 |
|
21-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: Pass kvm_init()'s opaque param to additional arch funcs Pass @opaque to kvm_arch_hardware_setup() and kvm_arch_check_processor_compat() to allow architecture specific code to reference @opaque without having to stash it away in a temporary global variable. This will enable x86 to separate its vendor specific callback ops, which are passed via @opaque, into "init" and "runtime" ops without having to stash away the "init" ops. No functional change intended. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Cornelia Huck <cohuck@redhat.com> #s390 Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9a5788c6 |
|
18-Mar-2020 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add a capability for enabling secure guests At present, on Power systems with Protected Execution Facility hardware and an ultravisor, a KVM guest can transition to being a secure guest at will. Userspace (QEMU) has no way of knowing whether a host system is capable of running secure guests. This will present a problem in future when the ultravisor is capable of migrating secure guests from one host to another, because virtualization management software will have no way to ensure that secure guests only run in domains where all of the hosts can support secure guests. This adds a VM capability which has two functions: (a) userspace can query it to find out whether the host can support secure guests, and (b) userspace can enable it for a guest, which allows that guest to become a secure guest. If userspace does not enable it, KVM will return an error when the ultravisor does the hypercall that indicates that the guest is starting to transition to a secure guest. The ultravisor will then abort the transition and the guest will terminate. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Ram Pai <linuxram@us.ibm.com>
|
#
1d0c32ec |
|
18-Mar-2020 |
Greg Kurz <groug@kaod.org> |
KVM: PPC: Fix kernel crash with PR KVM With PR KVM, shutting down a VM causes the host kernel to crash: [ 314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638 [ 314.219299] Faulting instruction address: 0xc008000000d4ddb0 cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0] pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr] lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr] sp: c00000036da07a30 msr: 900000010280b033 dar: c00800000176c638 dsisr: 40000000 current = 0xc00000036d4c0000 paca = 0xc000000001a00000 irqmask: 0x03 irq_happened: 0x01 pid = 1992, comm = qemu-system-ppc Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020 enter ? for help [c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr] [c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm] [c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm] [c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm] [c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm] [c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm] [c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm] [c00000036da07c30] c000000000420be0 __fput+0xe0/0x310 [c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0 [c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00 [c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0 [c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30 [c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68 This is caused by a use-after-free in kvmppc_mmu_pte_flush_all() which dereferences vcpu->arch.book3s which was previously freed by kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy() is called after kvmppc_core_vcpu_free() since commit ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common code"). The kvmppc_mmu_destroy() helper calls one of the following depending on the KVM backend: - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 32-bit) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 64-bit) - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc) It turns out that this is only relevant to PR KVM actually. And both 32 and 64 backends need vcpu->arch.book3s to be valid when calling kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy() from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the beginning of kvmppc_core_vcpu_free_pr(). This is consistent with kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr(). For the same reason, if kvmppc_core_vcpu_create_pr() returns an error then this means that kvmppc_mmu_init() was either not called or failed, in which case kvmppc_mmu_destroy() should not be called. Drop the line in the error path of kvm_arch_vcpu_create(). Fixes: ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common code") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158455341029.178873.15248663726399374882.stgit@bahia.lan
|
#
b2fa4f90 |
|
18-Mar-2020 |
Greg Kurz <groug@kaod.org> |
KVM: PPC: Book3S PR: Fix kernel crash with PR KVM With PR KVM, shutting down a VM causes the host kernel to crash: [ 314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638 [ 314.219299] Faulting instruction address: 0xc008000000d4ddb0 cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0] pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr] lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr] sp: c00000036da07a30 msr: 900000010280b033 dar: c00800000176c638 dsisr: 40000000 current = 0xc00000036d4c0000 paca = 0xc000000001a00000 irqmask: 0x03 irq_happened: 0x01 pid = 1992, comm = qemu-system-ppc Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020 enter ? for help [c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr] [c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm] [c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm] [c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm] [c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm] [c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm] [c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm] [c00000036da07c30] c000000000420be0 __fput+0xe0/0x310 [c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0 [c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00 [c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0 [c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30 [c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68 This is caused by a use-after-free in kvmppc_mmu_pte_flush_all() which dereferences vcpu->arch.book3s which was previously freed by kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy() is called after kvmppc_core_vcpu_free() since commit ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common code"). The kvmppc_mmu_destroy() helper calls one of the following depending on the KVM backend: - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 32-bit) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 64-bit) - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc) It turns out that this is only relevant to PR KVM actually. And both 32 and 64 backends need vcpu->arch.book3s to be valid when calling kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy() from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the beginning of kvmppc_core_vcpu_free_pr(). This is consistent with kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr(). For the same reason, if kvmppc_core_vcpu_create_pr() returns an error then this means that kvmppc_mmu_init() was either not called or failed, in which case kvmppc_mmu_destroy() should not be called. Drop the line in the error path of kvm_arch_vcpu_create(). Fixes: ff030fdf5573 ("KVM: PPC: Move kvm_vcpu_init() invocation to common code") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
8fc6ba0a |
|
10-Mar-2020 |
Joe Perches <joe@perches.com> |
KVM: PPC: Use fallthrough; Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
4d395762 |
|
28-Feb-2020 |
Peter Xu <peterx@redhat.com> |
KVM: Remove unnecessary asm/kvm_host.h includes Remove includes of asm/kvm_host.h from files that already include linux/kvm_host.h to make it more obvious that there is no ordering issue between the two headers. linux/kvm_host.h includes asm/kvm_host.h to pick up architecture specific settings, and this will never change, i.e. including asm/kvm_host.h after linux/kvm_host.h may seem problematic, but in practice is simply redundant. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e96c81ee |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: Simplify kvm_free_memslot() and all its descendents Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9d4c197c |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop "const" attribute from old memslot in commit_memory_region() Drop the "const" attribute from @old in kvm_arch_commit_memory_region() to allow arch specific code to free arch specific resources in the old memslot without having to cast away the attribute. Freeing resources in kvm_arch_commit_memory_region() paves the way for simplifying kvm_free_memslot() by eliminating the last usage of its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
414de7ab |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_create_memslot() Remove kvm_arch_create_memslot() now that all arch implementations are effectively nops. Removing kvm_arch_create_memslot() eliminates the possibility for arch specific code to allocate memory prior to setting a memslot, which sets the stage for simplifying kvm_free_memslot(). Cc: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
82307e67 |
|
18-Feb-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Move memslot memory allocation into prepare_memory_region() Allocate the rmap array during kvm_arch_prepare_memory_region() to pave the way for removing kvm_arch_create_memslot() altogether. Moving PPC's memory allocation only changes the order of kernel memory allocations between PPC and common KVM code. No functional change intended. Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ddd259c9 |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all arch specific implementations are nops. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
74ce2e60 |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Move all vcpu init code into kvm_arch_vcpu_create() Fold init() into create() now that the two are called back-to-back by common KVM code (kvm_vcpu_init() calls kvm_arch_vcpu_init() as its last action, and kvm_vm_ioctl_create_vcpu() calls kvm_arch_vcpu_create() immediately thereafter). Rinse and repeat for kvm_arch_vcpu_uninit() and kvm_arch_vcpu_destroy(). This paves the way for removing kvm_arch_vcpu_{un}init() entirely. Note, calling kvmppc_mmu_destroy() if kvmppc_core_vcpu_create() fails may or may not be necessary. Move it along with the more obvious call to kvmppc_subarch_vcpu_uninit() so as not to inadvertantly introduce a functional change and/or bug. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e529ef66 |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Move vcpu alloc and init invocation to common code Now that all architectures tightly couple vcpu allocation/free with the mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to common KVM code. Move both allocation and initialization in a single patch to eliminate thrash in arch specific code. The bisection benefits of moving the two pieces in separate patches is marginal at best, whereas the odds of introducing a transient arch specific bug are non-zero. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4543bdc0 |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Introduce kvm_vcpu_destroy() Add kvm_vcpu_destroy() and wire up all architectures to call the common function instead of their arch specific implementation. The common destruction function will be used by future patches to move allocation and initialization of vCPUs to common KVM code, i.e. to free resources that are allocated by arch agnostic code. No functional change intended. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
897cc38e |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Add kvm_arch_vcpu_precreate() to handle pre-allocation issues Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d5279f3a |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Drop kvm_arch_vcpu_free() Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ff030fdf |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Move kvm_vcpu_init() invocation to common code Move the kvm_cpu_{un}init() calls to common PPC code as an intermediate step towards removing kvm_cpu_{un}init() altogether. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c50bfbdc |
|
18-Dec-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: PPC: Allocate vcpu struct in common PPC code Move allocation of all flavors of PPC vCPUs to common PPC code. All variants either allocate 'struct kvm_vcpu' directly, or require that the embedded 'struct kvm_vcpu' member be located at offset 0, i.e. guarantee that the allocation can be directly interpreted as a 'struct kvm_vcpu' object. Remove the message from the build-time assertion regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
22945688 |
|
24-Nov-2019 |
Bharata B Rao <bharata@linux.ibm.com> |
KVM: PPC: Book3S HV: Support reset of secure guest Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF. This ioctl will be issued by QEMU during reset and includes the the following steps: - Release all device pages of the secure guest. - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall - Unpin the VPA pages so that they can be migrated back to secure side when guest becomes secure again. This is required because pinned pages can't be migrated. - Reinit the partition scoped page tables After these steps, guest is ready to issue UV_ESM call once again to switch to secure mode. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [Implementation of uv_svm_terminate() and its call from guest shutdown path] Signed-off-by: Ram Pai <linuxram@us.ibm.com> [Unpinning of VPA pages] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
1a9167a2 |
|
19-Jun-2019 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Report single stepping capability When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2ad7a27d |
|
26-Aug-2019 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required functions There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
741cbbae |
|
03-Aug-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: remove kvm_arch_has_vcpu_debugfs() There is no need for this function as all arches have to implement kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol let us actually simplify the code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
17e433b5 |
|
04-Aug-2019 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: Fix leak vCPU's VMCS value into other pCPU After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d94d71cb |
|
29-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 67 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f257d6dc |
|
19-Apr-2019 |
Sean Christopherson <seanjc@google.com> |
KVM: Directly return result from kvm_arch_check_processor_compat() Add a wrapper to invoke kvm_arch_check_processor_compat() so that the boilerplate ugliness of checking virtualization support on all CPUs is hidden from the arch specific code. x86's implementation in particular is quite heinous, as it unnecessarily propagates the out-param pattern into kvm_x86_ops. While the x86 specific issue could be resolved solely by changing kvm_x86_ops, make the change for all architectures as returning a value directly is prettier and technically more robust, e.g. s390 doesn't set the out param, which could lead to subtle breakage in the (highly unlikely) scenario where the out-param was not pre-initialized by the caller. Opportunistically annotate svm_check_processor_compat() with __init. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a86cb413 |
|
23-May-2019 |
Thomas Huth <thuth@redhat.com> |
KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all architectures. However, on s390x, the amount of usable CPUs is determined during runtime - it is depending on the features of the machine the code is running on. Since we are using the vcpu_id as an index into the SCA structures that are defined by the hardware (see e.g. the sca_add_vcpu() function), it is not only the amount of CPUs that is limited by the hard- ware, but also the range of IDs that we can use. Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too. So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common code into the architecture specific code, and on s390x we have to return the same value here as for KVM_CAP_MAX_VCPUS. This problem has been discovered with the kvm_create_max_vcpus selftest. With this change applied, the selftest now passes on s390x, too. Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190523164309.13345-9-thuth@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
3fab2d10 |
|
17-Apr-2019 |
Cédric Le Goater <clg@kaod.org> |
KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode Full support for the XIVE native exploitation mode is now available, advertise the capability KVM_CAP_PPC_IRQ_XIVE for guests running on PowerNV KVM Hypervisors only. Support for nested guests (pseries KVM Hypervisor) is not yet available. XIVE should also have been activated which is default setting on POWER9 systems running a recent Linux kernel. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
eacc56bb |
|
17-Apr-2019 |
Cédric Le Goater <clg@kaod.org> |
KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVE The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
c110ae57 |
|
28-Mar-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
kvm: move KVM_CAP_NR_MEMSLOTS to common code All architectures except MIPS were defining it in the same way, and memory slots are handled entirely by common code so there is no point in keeping the definition per-architecture. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2b57ecd0 |
|
28-Feb-2019 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() Add KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST & KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE to the characteristics returned from the H_GET_CPU_CHARACTERISTICS H-CALL, as queried from either the hypervisor or the device tree. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
03f95332 |
|
04-Feb-2019 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE Currently, the KVM code assumes that if the host kernel is using the XIVE interrupt controller (the new interrupt controller that first appeared in POWER9 systems), then the in-kernel XICS emulation will use the XIVE hardware to deliver interrupts to the guest. However, this only works when the host is running in hypervisor mode and has full access to all of the XIVE functionality. It doesn't work in any nested virtualization scenario, either with PR KVM or nested-HV KVM, because the XICS-on-XIVE code calls directly into the native-XIVE routines, which are not initialized and cannot function correctly because they use OPAL calls, and OPAL is not available in a guest. This means that using the in-kernel XICS emulation in a nested hypervisor that is using XIVE as its interrupt controller will cause a (nested) host kernel crash. To fix this, we change most of the places where the current code calls xive_enabled() to select between the XICS-on-XIVE emulation and the plain XICS emulation to call a new function, xics_on_xive(), which returns false in a guest. However, there is a further twist. The plain XICS emulation has some functions which are used in real mode and access the underlying XICS controller (the interrupt controller of the host) directly. In the case of a nested hypervisor, this means doing XICS hypercalls directly. When the nested host is using XIVE as its interrupt controller, these hypercalls will fail. Therefore this also adds checks in the places where the XICS emulation wants to access the underlying interrupt controller directly, and if that is XIVE, makes the code use the virtual mode fallback paths, which call generic kernel infrastructure rather than doing direct XICS access. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
873db2cd |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
cc6929cc |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
693ac10a |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. [paulus@ozlabs.org - fixed compilation for Book E.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
f032b734 |
|
11-Dec-2018 |
Bharata B Rao <bharata@linux.ibm.com> |
KVM: PPC: Pass change type down to memslot commit function Currently, kvm_arch_commit_memory_region() gets called with a parameter indicating what type of change is being made to the memslot, but it doesn't pass it down to the platform-specific memslot commit functions. This adds the `change' parameter to the lower-level functions so that they can use it in future. [paulus@ozlabs.org - fix book E also.] Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
e5d83c74 |
|
16-Feb-2017 |
Paolo Bonzini <pbonzini@redhat.com> |
kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
aa069a99 |
|
21-Sep-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
de760db4 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode With this, the KVM-HV module can be loaded in a guest running under KVM-HV, and if the hypervisor supports nested virtualization, this guest can now act as a nested hypervisor and run nested guests. This also adds some checks to inform userspace that HPT guests are not supported by nested hypervisors (by returning false for the KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from configuring a guest to use HPT mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
45ef5992 |
|
05-Jul-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: remove unnecessary inclusion of asm/tlbflush.h asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4eeb8556 |
|
27-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Remove mmio_vsx_tx_sx_enabled in KVM MMIO emulation Originally PPC KVM MMIO emulation uses only 0~31#(5 bits) for VSR reg number, and use mmio_vsx_tx_sx_enabled field together for 0~63# VSR regs. Currently PPC KVM MMIO emulation is reimplemented with analyse_instr() assistance. analyse_instr() returns 0~63 for VSR register number, so it is not necessary to use additional mmio_vsx_tx_sx_enabled field any more. This patch extends related reg bits (expand io_gpr to u16 from u8 and use 6 bits for VSR reg#), so that mmio_vsx_tx_sx_enabled can be removed. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
1499fa80 |
|
18-Apr-2018 |
Souptick Joarder <jrdr.linux@gmail.com> |
kvm: Change return type to vm_fault_t Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c8235c28 |
|
23-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl Since the vcpu mutex locking/unlock has been moved out of vcpu_load() /vcpu_put(), KVM_GET_ONE_REG and KVM_SET_ONE_REG doesn't need to do ioctl with loading vcpu anymore. This patch removes vcpu_load()/vcpu_put() from KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctl. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
b3cebfe8 |
|
23-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl Although we already have kvm_arch_vcpu_async_ioctl() which doesn't require ioctl to load vcpu, the sync ioctl code need to be cleaned up when CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL is not configured. This patch moves vcpu_load/vcpu_put down to each ioctl switch case so that each ioctl can decide to do vcpu_load/vcpu_put or not independently. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
d234d68e |
|
23-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl With current patch set, PR KVM now supports HTM. So this patch turns it on for PR KVM. Tested with: https://github.com/justdoitqd/publicFiles/blob/master/test_kvm_htm_cap.c Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
acc9eb93 |
|
20-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with analyse_instr() input. When emulating the store, the VMX reg will need to be flushed so that the right reg val can be retrieved before writing to IO MEM. This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx MMIO emulation. To meet the requirement of handling different element sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64() were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store(). The framework used is similar to VSX instruction MMIO emulation. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
da2a32b8 |
|
20-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Expand mmio_vsx_copy_type to cover VMX load/store element types VSX MMIO emulation uses mmio_vsx_copy_type to represent VSX emulated element size/type, such as KVMPPC_VSX_COPY_DWORD_LOAD, etc. This patch expands mmio_vsx_copy_type to cover VMX copy type, such as KVMPPC_VMX_COPY_BYTE(stvebx/lvebx), etc. As a result, mmio_vsx_copy_type is also renamed to mmio_copy_type. It is a preparation for reimplementing VMX MMIO emulation. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2e6baa46 |
|
20-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Add giveup_ext() hook to PPC KVM ops Currently HV will save math regs(FP/VEC/VSX) when trap into host. But PR KVM will only save math regs when qemu task switch out of CPU, or when returning from qemu code. To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math regs were flushed firstly and then be able to update saved VCPU FPR/VEC/VSX area reasonably. This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook (when not NULL) to flush math regs accordingly, before updating saved register vals. Math regs flush is also necessary for STORE, which will be covered in later patch within this patch series. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
94dd7fa1 |
|
20-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Add KVMPPC_VSX_COPY_WORD_LOAD_DUMP type support for mmio emulation Some VSX instructions like lxvwsx will splat word into VSR. This patch adds a new VSX copy type KVMPPC_VSX_COPY_WORD_LOAD_DUMP to support this. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
f19d1f36 |
|
07-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Fix a mmio_host_swabbed uninitialized usage issue When KVM emulates VMX store, it will invoke kvmppc_get_vmx_data() to retrieve VMX reg val. kvmppc_get_vmx_data() will check mmio_host_swabbed to decide which double word of vr[] to be used. But the mmio_host_swabbed can be uninitialized during VMX store procedure: kvmppc_emulate_loadstore \- kvmppc_handle_store128_by2x64 \- kvmppc_get_vmx_data So vcpu->arch.mmio_host_swabbed is not meant to be used at all for emulation of store instructions, and this patch makes that true for VMX stores. This patch also initializes mmio_host_swabbed to avoid possible future problems. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
4bb3c7a0 |
|
21-Mar-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6df3877f |
|
12-Feb-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions Some versions of gcc generate a warning that the variable "emulated" may be used uninitialized in function kvmppc_handle_load128_by2x64(). It would be used uninitialized if kvmppc_handle_load128_by2x64 was ever called with vcpu->arch.mmio_vmx_copy_nums == 0, but neither of the callers ever do that, so there is no actual bug. When gcc generates a warning, it causes the build to fail because arch/powerpc is compiled with -Werror. This silences the warning by initializing "emulated" to EMULATE_DONE. Fixes: 09f984961c13 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
c662f773 |
|
12-Feb-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n Commit accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out" statement and an "out:" label to kvm_arch_vcpu_ioctl_run(). Since the only "goto out" is inside a CONFIG_VSX block, compiling with CONFIG_VSX=n gives a warning that label "out" is defined but not used, and because arch/powerpc is compiled with -Werror, that becomes a compile error that makes the kernel build fail. Merge commit 1ab03c072feb ("Merge tag 'kvm-ppc-next-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc", 2018-02-09) added a similar block of code inside a #ifdef CONFIG_ALTIVEC, with a "goto out" statement. In order to make the build succeed, this adds a #ifdef around the "out:" label. This is a minimal, ugly fix, to be replaced later by a refactoring of the code. Since CONFIG_VSX depends on CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here. Fixes: accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
09f98496 |
|
03-Feb-2018 |
Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> |
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions This patch provides the MMIO load/store vector indexed X-Form emulation. Instructions implemented: lvx: the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0 is loaded into VRT. stvx: the contents of VRS are stored into the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
790a9df5 |
|
01-Feb-2018 |
David Gibson <david@gibson.dropbear.id.au> |
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 This adds code to enable the HPT resizing code to work on POWER9, which uses a slightly modified HPT entry format compared to POWER8. On POWER9, we convert HPTEs read from the HPT from the new format to the old format so that the rest of the HPT resizing code can work as before. HPTEs written to the new HPT are converted to the new format as the last step before writing them into the new HPT. This takes out the checks added by commit bcd3bb63dbc8 ("KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18), now that HPT resizing works on POWER9. On POWER9, when we pivot to the new HPT, we now call kvmppc_setup_partition_table() to update the partition table in order to make the hardware use the new HPT. [paulus@ozlabs.org - added kvmppc_setup_partition_table() call, wrote commit message.] Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
3214d01f |
|
14-Jan-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace information about the underlying machine's level of vulnerability to the recently announced vulnerabilities CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754, and whether the machine provides instructions to assist software to work around the vulnerabilities. The ioctl returns two u64 words describing characteristics of the CPU and required software behaviour respectively, plus two mask words which indicate which bits have been filled in by the kernel, for extensibility. The bit definitions are the same as for the new H_GET_CPU_CHARACTERISTICS hypercall. There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which indicates whether the new ioctl is available. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
5855564c |
|
12-Jan-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Enable migration of decrementer register This adds a register identifier for use with the one_reg interface to allow the decrementer expiry time to be read and written by userspace. The decrementer expiry time is in guest timebase units and is equal to the sum of the decrementer and the guest timebase. (The expiry time is used rather than the decrementer value itself because the expiry time is not constantly changing, though the decrementer value is, while the guest vcpu is not running.) Without this, a guest vcpu migrated to a new host will see its decrementer set to some random value. On POWER8 and earlier, the decrementer is 32 bits wide and counts down at 512MHz, so the guest vcpu will potentially see no decrementer interrupts for up to about 4 seconds, which will lead to a stall. With POWER9, the decrementer is now 56 bits side, so the stall can be much longer (up to 2.23 years) and more noticeable. To help work around the problem in cases where userspace has not been updated to migrate the decrementer expiry time, we now set the default decrementer expiry at vcpu creation time to the current time rather than the maximum possible value. This should mean an immediate decrementer interrupt when a migrated vcpu starts running. In cases where the decrementer is 32 bits wide and more than 4 seconds elapse between the creation of the vcpu and when it first runs, the decrementer would have wrapped around to positive values and there may still be a stall - but this is no worse than the current situation. In the large-decrementer case, we are sure to get an immediate decrementer interrupt (assuming the time from vcpu creation to first run is less than 2.23 years) and we thus avoid a very long stall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
5cb0944c |
|
12-Dec-2017 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: introduce kvm_arch_vcpu_async_ioctl After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU ioctl is already much clearer in that it is obvious that they bypass vcpu_load and vcpu_put. However, it is still not perfect in that the different state of the VCPU mutex is still hidden in the caller. Separate those ioctls into a new function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more "traditional" synchronous ioctls. Cc: James Hogan <jhogan@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9b062471 |
|
04-Dec-2017 |
Christoffer Dall <christoffer.dall@linaro.org> |
KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl Move the calls to vcpu_load() and vcpu_put() in to the architecture specific implementations of kvm_arch_vcpu_ioctl() which dispatches further architecture-specific ioctls on to other functions. Some architectures support asynchronous vcpu ioctls which cannot call vcpu_load() or take the vcpu->mutex, because that would prevent concurrent execution with a running VCPU, which is the intended purpose of these ioctls, for example because they inject interrupts. We repeat the separate checks for these specifics in the architecture code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and calling vcpu_load for these ioctls. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
accb757d |
|
04-Dec-2017 |
Christoffer Dall <christoffer.dall@linaro.org> |
KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_run(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts Reviewed-by: Cornelia Huck <cohuck@redhat.com> [Rebased. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
20b7035c |
|
24-Nov-2017 |
Jan H. Schönherr <jschoenh@amazon.de> |
KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9aa6825b |
|
20-Nov-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Eliminate some unnecessary checks In an excess of caution, commit 6f63e81bda98 ("KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions", 2017-02-21) included checks for the case that vcpu->arch.mmio_vsx_copy_nums is less than zero, even though its type is u8. This causes a Coverity warning, so we remove the check for < 0. We also adjust the associated comment to be more accurate ("4 or less" rather than "less than 4"). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
18c3640c |
|
13-Sep-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host This sets up the machinery for switching a guest between HPT (hashed page table) and radix MMU modes, so that in future we can run a HPT guest on a radix host on POWER9 machines. * The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or radix mode, on a radix host. * The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9 with HV KVM on a radix host. * The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a radix host. * The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the guest to HPT mode and allocate a HPT. * For simplicity, we now allocate the rmap array for each memslot, even on a radix host, since it will be needed if the guest switches to HPT mode. * Since we cannot yet run a HPT guest on a radix host, the KVM_RUN ioctl will return an EINVAL error in that case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2a3d6553 |
|
12-Oct-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM feature Currently we use CPU_FTR_TM to decide if the CPU/kernel can support TM (Transactional Memory), and if it's true we advertise that to Qemu (or similar) via KVM_CAP_PPC_HTM. PPC_FEATURE2_HTM is the user-visible feature bit, which indicates that the CPU and kernel can support TM. Currently CPU_FTR_TM and PPC_FEATURE2_HTM always have the same value, either true or false, so using the former for KVM_CAP_PPC_HTM is correct. However some Power9 CPUs can operate in a mode where TM is enabled but TM suspended state is disabled. In this mode CPU_FTR_TM is true, but PPC_FEATURE2_HTM is false. Instead a different PPC_FEATURE2 bit is set, to indicate that this different mode of TM is available. It is not safe to let guests use TM as-is, when the CPU is in this mode. So to prevent that from happening, use PPC_FEATURE2_HTM to determine the value of KVM_CAP_PPC_HTM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
ac64115a |
|
14-Sep-2017 |
Greg Kurz <groug@kaod.org> |
KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM The following program causes a kernel oops: #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> main() { int fd = open("/dev/kvm", O_RDWR); ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM); } This happens because when using the global KVM fd with KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets called with a NULL kvm argument, which gets dereferenced in is_kvmppc_hv_enabled(). Spotted while reading the code. Let's use the hv_enabled fallback variable, like everywhere else in this function. Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
199b5763 |
|
07-Aug-2017 |
Longpeng(Mike) <longpeng2@huawei.com> |
KVM: add spinlock optimization framework If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2ed4f9dd |
|
21-Jun-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes Now that userspace can set the virtual SMT mode by enabling the KVM_CAP_PPC_SMT capability, it is useful for userspace to be able to query the set of possible virtual SMT modes. This provides a new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this information. The return value is a bitmap of possible modes, with bit N set if virtual SMT mode 2^N is available. That is, 1 indicates SMT1 is available, 2 indicates that SMT2 is available, 3 indicates that both SMT1 and SMT2 are available, and so on. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
134764ed |
|
11-May-2017 |
Aravinda Prasad <aravinda@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Add new capability to control MCE behaviour This introduces a new KVM capability to control how KVM behaves on machine check exception (MCE) in HV KVM guests. If this capability has not been enabled, KVM redirects machine check exceptions to guest's 0x200 vector, if the address in error belongs to the guest. With this capability enabled, KVM will cause a guest exit with the exit reason indicating an NMI. The new capability is required to avoid problems if a new kernel/KVM is used with an old QEMU, running a guest that doesn't issue "ibm,nmi-register". As old QEMU does not understand the NMI exit type, it treats it as a fatal error. However, the guest could have handled the machine check error if the exception was delivered to guest's 0x200 interrupt vector instead of NMI exit in case of old QEMU. [paulus@ozlabs.org - Reworded the commit message to be clearer, enable only on HV KVM.] Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
57900694 |
|
16-May-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9 On POWER9, we no longer have the restriction that we had on POWER8 where all threads in a core have to be in the same partition, so the CPU threads are now independent. However, we still want to be able to run guests with a virtual SMT topology, if only to allow migration of guests from POWER8 systems to POWER9. A guest that has a virtual SMT mode greater than 1 will expect to be able to use the doorbell facility; it will expect the msgsndp and msgclrp instructions to work appropriately and to be able to read sensible values from the TIR (thread identification register) and DPDES (directed privileged doorbell exception status) special-purpose registers. However, since each CPU thread is a separate sub-processor in POWER9, these instructions and registers can only be used within a single CPU thread. In order for these instructions to appear to act correctly according to the guest's virtual SMT mode, we have to trap and emulate them. We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR register. The emulation is triggered by the hypervisor facility unavailable interrupt that occurs when the guest uses them. To cause a doorbell interrupt to occur within the guest, we set the DPDES register to 1. If the guest has interrupts enabled, the CPU will generate a doorbell interrupt and clear the DPDES register in hardware. The DPDES hardware register for the guest is saved in the vcpu->arch.vcore->dpdes field. Since this gets written by the guest exit code, other VCPUs wishing to cause a doorbell interrupt don't write that field directly, but instead set a vcpu->arch.doorbell_request flag. This is consumed and set to 0 by the guest entry code, which then sets DPDES to 1. Emulating reads of the DPDES register is somewhat involved, because it requires reading the doorbell pending interrupt status of all of the VCPU threads in the virtual core, and if any of those VCPUs are running, their doorbell status is only up-to-date in the hardware DPDES registers of the CPUs where they are running. In order to get a reasonable approximation of the current doorbell status, we send those CPUs an IPI, causing an exit from the guest which will update the vcpu->arch.vcore->dpdes field. We then use that value in constructing the emulated DPDES register value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
3c313524 |
|
05-Feb-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode This allows userspace to set the desired virtual SMT (simultaneous multithreading) mode for a VM, that is, the number of VCPUs that get assigned to each virtual core. Previously, the virtual SMT mode was fixed to the number of threads per subcore, and if userspace wanted to have fewer vcpus per vcore, then it would achieve that by using a sparse CPU numbering. This had the disadvantage that the vcpu numbers can get quite large, particularly for SMT1 guests on a POWER8 with 8 threads per core. With this patch, userspace can set its desired virtual SMT mode and then use contiguous vcpu numbering. On POWER8, where the threading mode is "strict", the virtual SMT mode must be less than or equal to the number of threads per subcore. On POWER9, which implements a "loose" threading mode, the virtual SMT mode can be any power of 2 between 1 and 8, even though there is effectively one thread per subcore, since the threads are independent and can all be in different partitions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2fa6e1e1 |
|
04-Jun-2017 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: add kvm_request_pending A first step in vcpu->requests encapsulation. Additionally, we now use READ_ONCE() when accessing vcpu->requests, which ensures we always load vcpu->requests when it's accessed. This is important as other threads can change it any time. Also, READ_ONCE() documents that vcpu->requests is used with other threads, likely requiring memory barriers, which it does. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [ Documented the new use of READ_ONCE() and converted another check in arch/mips/kvm/vz.c ] Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
#
76d837a4 |
|
10-May-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms Commit e91aa8e6ecd5 ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit Book 3S kernel configurations in order to simplify the code and reduce #ifdefs. However, 64-bit Book 3S PPC platforms other than pseries and powernv don't implement the necessary IOMMU callbacks, leading to build failures like the following (for a pasemi config): scripts/kconfig/conf --silentoldconfig Kconfig warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES)) ... CC [M] arch/powerpc/kvm/book3s_64_vio.o /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’: /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration] iommu_tce_xchg(tbl, entry, &hpa, &dir); ^ To fix this, we make the inclusion of the SPAPR TCE support, and the code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on the inclusion of support for the pseries and/or powernv platforms. This means that when running a 'pseries' guest on those platforms, the guest won't have in-kernel acceleration of the PAPR TCE hypercalls, but at least now they compile. Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
72875d8a |
|
26-Apr-2017 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit Users were expected to use kvm_check_request() for testing and clearing, but request have expanded their use since then and some users want to only test or do a faster clear. Make sure that requests are not directly accessed with bit operations. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5af50993 |
|
05-Apr-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
121f80ba |
|
21-Mar-2017 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: VFIO: Add in-kernel acceleration for VFIO This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
6f63e81b |
|
21-Feb-2017 |
Bin Lu <lblulb@linux.vnet.ibm.com> |
KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions This patch provides the MMIO load/store emulation for instructions of 'double & vector unsigned char & vector signed char & vector unsigned short & vector signed short & vector unsigned int & vector signed int & vector double '. The instructions that this adds emulation for are: - ldx, ldux, lwax, - lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux, - stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx, - lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx, - stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x [paulus@ozlabs.org - some cleanups, fixes and rework, make it compile for Book E, fix build when PR KVM is built in] Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
30422558 |
|
31-Mar-2017 |
Paolo Bonzini <pbonzini@redhat.com> |
kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
174cd4b1 |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
bcd3bb63 |
|
17-Feb-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now The new HPT resizing code added in commit b5baa6877315 ("KVM: PPC: Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't have code to handle the new HPTE format which POWER9 uses. Thus it would be best not to advertise it to userspace on POWER9 systems until it works properly. Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that could be hit on POWER9, let's prevent it from being called on POWER9 for now. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
460df4c1 |
|
08-Feb-2017 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: race-free exit from KVM_RUN without POSIX signals The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick" a VCPU out of KVM_RUN through a POSIX signal. A signal is attached to a dummy signal handler; by blocking the signal outside KVM_RUN and unblocking it inside, this possible race is closed: VCPU thread service thread -------------------------------------------------------------- check flag set flag raise signal (signal handler does nothing) KVM_RUN However, one issue with KVM_SET_SIGNAL_MASK is that it has to take tsk->sighand->siglock on every KVM_RUN. This lock is often on a remote NUMA node, because it is on the node of a thread's creator. Taking this lock can be very expensive if there are many userspace exits (as is the case for SMP Windows VMs without Hyper-V reference time counter). As an alternative, we can put the flag directly in kvm_run so that KVM can see it: VCPU thread service thread -------------------------------------------------------------- raise signal signal handler set run->immediate_exit KVM_RUN check run->immediate_exit Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
050f2339 |
|
19-Dec-2016 |
David Gibson <david@gibson.dropbear.id.au> |
KVM: PPC: Book3S HV: Advertise availablity of HPT resizing on KVM HV This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the presence of in-kernel HPT resizing on KVM HV. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
8cf4ecc0 |
|
30-Jan-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Enable radix guest support This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
468808bd |
|
30-Jan-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl for HPT guests on POWER9. With this, we can return 1 for the KVM_CAP_PPC_MMU_HASH_V3 capability. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c9270132 |
|
30-Jan-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU This adds two capabilities and two ioctls to allow userspace to find out about and configure the POWER9 MMU in a guest. The two capabilities tell userspace whether KVM can support a guest using the radix MMU, or using the hashed page table (HPT) MMU with a process table and segment tables. (Note that the MMUs in the POWER9 processor cores do not use the process and segment tables when in HPT mode, but the nest MMU does). The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify whether a guest will use the radix MMU or the HPT MMU, and to specify the size and location (in guest space) of the process table. The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about the radix MMU. It returns a list of supported radix tree geometries (base page size and number of bits indexed at each level of the radix tree) and the encoding used to specify the various page sizes for the TLB invalidate entry instruction. Initially, both capabilities return 0 and the ioctls return -EINVAL, until the necessary infrastructure for them to operate correctly is added. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a8acaece |
|
22-Nov-2016 |
David Gibson <david@gibson.dropbear.id.au> |
KVM: PPC: Correctly report KVM_CAP_PPC_ALLOC_HTAB At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled. However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually works on KVM HV. On KVM PR it will fail with ENOTTY. QEMU already has a workaround for this, so it's not breaking things in practice, but it would be better to advertise this correctly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
45c940ba |
|
17-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores With POWER9, each CPU thread has its own MMU context and can be in the host or a guest independently of the other threads; there is still however a restriction that all threads must use the same type of address translation, either radix tree or hashed page table (HPT). Since we only support HPT guests on a HPT host at this point, we can treat the threads as being independent, and avoid all of the work of coordinating the CPU threads. To make this simpler, we introduce a new threads_per_vcore() function that returns 1 on POWER9 and threads_per_subcore on POWER7/8, and use that instead of threads_per_subcore or threads_per_core in various places. This also changes the value of the KVM_CAP_PPC_SMT capability on POWER9 systems from 4 to 1, so that userspace will not try to create VMs with multiple vcpus per vcore. (If userspace did create a VM that thought it was in an SMT mode, the VM might try to use the msgsndp instruction, which will not work as expected. In future it may be possible to trap and emulate msgsndp in order to allow VMs to think they are in an SMT mode, if only for the purpose of allowing migration from POWER8 systems.) With all this, we can now run guests on POWER9 as long as the host is running with HPT translation. Since userspace currently has no way to request radix tree translation for the guest, the guest has no choice but to use HPT translation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
235539b4 |
|
07-Sep-2016 |
Luiz Capitulino <lcapitulino@redhat.com> |
kvm: add stubs for arch specific debugfs support Two stubs are added: o kvm_arch_has_vcpu_debugfs(): must return true if the arch supports creating debugfs entries in the vcpu debugfs dir (which will be implemented by the next commit) o kvm_arch_create_vcpu_debugfs(): code that creates debugfs entries in the vcpu debugfs dir For x86, this commit introduces a new file to avoid growing arch/x86/kvm/x86.c even more. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9576730d |
|
18-Aug-2016 |
Suresh Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: select IRQ_BYPASS_MANAGER Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set. Add the PPC producer functions for add and del producer. [paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c so booke compiles; added kvm_arch_has_irq_bypass implementation.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
34a75b0f |
|
09-Aug-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Implement kvm_arch_intc_initialized() for PPC It doesn't make sense to create irqfds for a VM that doesn't have in-kernel interrupt controller emulation. There is an existing interface for architecture code to tell the irqfd code whether or not any interrupt controller has been initialized, called kvm_arch_intc_initialized(), so let's implement that for powerpc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
23528bb2 |
|
19-Jul-2016 |
Sam Bobroff <sam.bobroff@au1.ibm.com> |
KVM: PPC: Introduce KVM_CAP_PPC_HTM Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to determine if a PowerPC KVM guest should use HTM (Hardware Transactional Memory). This will be used by QEMU to populate the pa-features bits in the guest's device tree. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6edaa530 |
|
15-Jun-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: remove kvm_guest_enter/exit wrappers Use the functions from context_tracking.h directly. Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
eb8b0560 |
|
05-May-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Fix emulated MMIO sign-extension When the guest does a sign-extending load instruction (such as lha or lwa) to an emulated MMIO location, it results in a call to kvmppc_handle_loads() in the host. That function sets the vcpu->arch.mmio_sign_extend flag and calls kvmppc_handle_load() to do the rest of the work. However, kvmppc_handle_load() sets the mmio_sign_extend flag to 0 unconditionally, so the sign extension never gets done. To fix this, we rename kvmppc_handle_load to __kvmppc_handle_load and add an explicit parameter to indicate whether sign extension is required. kvmppc_handle_load() and kvmppc_handle_loads() then become 1-line functions that just call __kvmppc_handle_load() with the extra parameter. Reported-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
489153c7 |
|
12-Mar-2016 |
Lan Tianyu <tianyu.lan@intel.com> |
KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter() The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
58ded420 |
|
29-Feb-2016 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Add support for 64bit TCE windows The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not enough for directly mapped windows as the guest can get more than 4GB. This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against the locked memory limit. Since 64bit windows are to support Dynamic DMA windows (DDW), let's add @bus_offset and @page_shift which are also required by DDW. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
e17769eb |
|
21-Dec-2015 |
Suresh E. Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d3695aa4 |
|
14-Feb-2016 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Add support for multiple-TCE hcalls This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. The current implementation of kvmppc_h_stuff_tce() allows it to be executed in both real and virtual modes so there is one helper. The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address to the host address and since the translation is different, there are 2 helpers - one for each mode. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
b4d7f161 |
|
13-Jan-2016 |
Greg Kurz <groug@kaod.org> |
KVM: PPC: Fix ONE_REG AltiVec support The get and set operations got exchanged by mistake when moving the code from book3s.c to powerpc.c. Fixes: 3840edc8033ad5b86deee309c1c321ca54257452 Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
bfec5c2c |
|
15-Oct-2015 |
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> |
KVM: PPC: Implement extension to report number of memslots QEMU assumes 32 memslots if this extension is not implemented. Although, current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS changes QEMU would take a wrong value. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5358a963 |
|
22-May-2015 |
Thomas Huth <thuth@redhat.com> |
KVM: PPC: Fix warnings from sparse When compiling the KVM code for POWER with "make C=1", sparse complains about functions missing proper prototypes and a 64-bit constant missing the ULL prefix. Let's fix this by making the functions static or by including the proper header with the prototypes, and by appending a ULL prefix to the constant PPC_MPPE_ADDRESS_MASK. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f36f3f28 |
|
18-May-2015 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: add "new" argument to kvm_arch_commit_memory_region This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
09170a49 |
|
18-May-2015 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: const-ify uses of struct kvm_userspace_memory_region Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ccf73aaf |
|
30-Apr-2015 |
Christian Borntraeger <borntraeger@de.ibm.com> |
KVM: arm/mips/x86/power use __kvm_guest_{enter|exit} Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit} where interrupts are disabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e928e9cb |
|
20-Mar-2015 |
Michael Ellerman <michael@ellerman.id.au> |
KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. Some PowerNV systems include a hardware random-number generator. This HWRNG is present on POWER7+ and POWER8 chips and is capable of generating one 64-bit random number every microsecond. The random numbers are produced by sampling a set of 64 unstable high-frequency oscillators and are almost completely entropic. PAPR defines an H_RANDOM hypercall which guests can use to obtain one 64-bit random sample from the HWRNG. This adds a real-mode implementation of the H_RANDOM hypercall. This hypercall was implemented in real mode because the latency of reading the HWRNG is generally small compared to the latency of a guest exit and entry for all the threads in the same virtual core. Userspace can detect the presence of the HWRNG and the H_RANDOM implementation by querying the KVM_CAP_PPC_HWRNG capability. The H_RANDOM hypercall implementation will only be invoked when the guest does an H_RANDOM hypercall if userspace first enables the in-kernel H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
e32edf4f |
|
26-Mar-2015 |
Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> |
KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
#
d078eed3 |
|
02-Feb-2015 |
David Gibson <david@gibson.dropbear.id.au> |
powerpc: Cleanup KVM emulated load/store endian handling Sometimes the KVM code on powerpc needs to emulate load or store instructions from the guest, which can include both normal and byte reversed forms. We currently (AFAICT) handle this correctly, but some variable names are very misleading. In particular we use "is_bigendian" in several places to actually mean "is the IO the same endian as the host", but we now support little-endian powerpc hosts. This also ties into the misleadingly named ld_le*() and st_le*() functions, which in fact always byteswap, even on an LE host. This patch cleans this up by renaming to more accurate "host_swabbed", and uses the generic swab*() functions instead of the powerpc specific and misleadingly named ld_le*() and st_le*() functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
31928aa5 |
|
04-Dec-2014 |
Dominik Dingel <dingel@linux.vnet.ibm.com> |
KVM: remove unneeded return value of vcpu_postcreate The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
c17b98cf |
|
02-Dec-2014 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Remove code for PPC970 processors This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
8d0eff63 |
|
10-Sep-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Pass enum to kvmppc_get_last_inst The kvmppc_get_last_inst function recently received a facelift that allowed us to pass an enum of the type of instruction we want to read into it rather than an unreadable boolean. Unfortunately, not all callers ended up passing the enum. This wasn't really an issue as "true" and "false" happen to match the two enum values we have, but it's still hard to read. Update all callers of kvmppc_get_last_inst() to follow the new calling convention. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
d02d4d15 |
|
01-Sep-2014 |
Mihai Caraman <mihai.caraman@freescale.com> |
KVM: PPC: Remove the tasklet used by the hrtimer Powerpc timer implementation is a copycat version of s390. Now that they removed the tasklet with commit ea74c0ea1b24a6978a6ebc80ba4dbc7b7848b32d follow this optimization. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
3840edc8 |
|
20-Aug-2014 |
Mihai Caraman <mihai.caraman@freescale.com> |
KVM: PPC: Move ONE_REG AltiVec support to powerpc Move ONE_REG AltiVec support to powerpc generic layer. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
8a41ea53 |
|
20-Aug-2014 |
Mihai Caraman <mihai.caraman@freescale.com> |
KVM: PPC: Make ONE_REG powerpc generic Make ONE_REG generic for server and embedded architectures by moving kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions to powerpc layer. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
13a34e06 |
|
28-Aug-2014 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: remove garbage arg to *hardware_{en,dis}able In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0865e636 |
|
28-Aug-2014 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: static inline empty kvm_arch functions Using static inline is going to save few bytes and cycles. For example on powerpc, the difference is 700 B after stripping. (5 kB before) This patch also deals with two overlooked empty functions: kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c 2df72e9bc KVM: split kvm_arch_flush_shadow and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c. e790d9ef6 KVM: add kvm_arch_sched_in Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e790d9ef |
|
21-Aug-2014 |
Radim Krčmář <rkrcmar@redhat.com> |
KVM: add kvm_arch_sched_in Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
478d6686 |
|
05-Aug-2014 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: PPC: drop duplicate tracepoint Commit 29577fc00ba4 ("KVM: PPC: HV: Remove generic instruction emulation") caused a build failure with allyesconfig: arch/powerpc/kvm/kvm-pr.o:(__tracepoints+0xa8): multiple definition of `__tracepoint_kvm_ppc_instr' arch/powerpc/kvm/kvm.o:(__tracepoints+0x1c0): first defined here due to a duplicate definition of the tracepoint in trace.h and trace_pr.h. Because the tracepoint is still used by Book3S HV code, and because the PR code does include trace.h, just remove the duplicate definition from trace_pr.h, and export it from kvm.o. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ce91ddc4 |
|
28-Jul-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Remove DCR handling DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
d69614a2 |
|
18-Jun-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Separate loadstore emulation from priv emulation Today the instruction emulator can get called via 2 separate code paths. It can either be called by MMIO emulation detection code or by privileged instruction traps. This is bad, as both code paths prepare the environment differently. For MMIO emulation we already know the virtual address we faulted on, so instructions there don't have to actually fetch that information. Split out the two separate use cases into separate files. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
c12fb43c |
|
20-Jun-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Handle magic page in kvmppc_ld/st We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as well access the magic page. Special case it out so that we can properly access it. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
c45c5514 |
|
20-Jun-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use kvm_read_guest in kvmppc_ld We have a nice and handy helper to read from guest physical address space, so we should make use of it in kvmppc_ld as we already do for its counterpart in kvmppc_st. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
9897e88a |
|
20-Jun-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Remove kvmppc_bad_hva() We have a proper define for invalid HVA numbers. Use those instead of the ppc specific kvmppc_bad_hva(). Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
35c4a733 |
|
20-Jun-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Move kvmppc_ld/st to common code We have enough common infrastructure now to resolve GVA->GPA mappings at runtime. With this we can move our book3s specific helpers to load / store in guest virtual address space to common code as well. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
7a58777a |
|
14-Jul-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode With Book3S KVM we can create both PR and HV VMs in parallel on the same machine. That gives us new challenges on the CAPs we return - both have different capabilities. When we get asked about CAPs on the kvm fd, there's nothing we can do. We can try to be smart and assume we're running HV if HV is available, PR otherwise. However with the newly added VM CHECK_EXTENSION we can now ask for capabilities directly on a VM which knows whether it's PR or HV. With this patch I can successfully expose KVM PVINFO data to user space in the PR case, fixing magic page mapping for PAPR guests. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
784aa3d7 |
|
14-Jul-2014 |
Alexander Graf <agraf@suse.de> |
KVM: Rename and add argument to check_extension In preparation to make the check_extension function available to VM scope we add a struct kvm * argument to the function header and rename the function accordingly. It will still be called from the /dev/kvm fd, but with a NULL argument for struct kvm *. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2677b8d |
|
25-Jul-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Remove 440 support The 440 target hasn't been properly functioning for a few releases and before I was the only one who fixes a very serious bug that indicates to me that nobody used it before either. Furthermore KVM on 440 is slow to the extent of unusable. We don't have to carry along completely unused code. Remove 440 and give us one less thing to worry about. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
51f04726 |
|
23-Jul-2014 |
Mihai Caraman <mihai.caraman@freescale.com> |
KVM: PPC: Allow kvmppc_get_last_inst() to fail On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
89b68c96 |
|
13-Jul-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Book3S: Make magic page properly 4k mappable The magic page is defined as a 4k page of per-vCPU data that is shared between the guest and the host to accelerate accesses to privileged registers. However, when the host is using 64k page size granularity we weren't quite as strict about that rule anymore. Instead, we partially treated all of the upper 64k as magic page and mapped only the uppermost 4k with the actual magic contents. This works well enough for Linux which doesn't use any memory in kernel space in the upper 64k, but Mac OS X got upset. So this patch makes magic page actually stay in a 4k range even on 64k page size hosts. This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE hosts for me. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
ae2113a4 |
|
01-Jun-2014 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
699a0ea0 |
|
01-Jun-2014 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f2e91042 |
|
22-May-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add CAP to indicate hcall fixes We worked around some nasty KVM magic page hcall breakages: 1) NX bit not honored, so ignore NX when we detect it 2) LE guests swizzle hypercall instruction Without these fixes in place, there's no way it would make sense to expose kvm hypercalls to a guest. Chances are immensely high it would trip over and break. So add a new CAP that gives user space a hint that we have workarounds for the bugs above in place. It can use those as hint to disable PV hypercalls when the guest CPU is anything POWER7 or higher and the host does not have fixes in place. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f3383cf8 |
|
11-May-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Disable NX for old magic page using guests Old guests try to use the magic page, but map their trampoline code inside of an NX region. Since we can't fix those old kernels, try to detect whether the guest is sane or not. If not, just disable NX functionality in KVM so that old guests at least work at all. For newer guests, add a bit that we can set to keep NX functionality available. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5deb8e7a |
|
24-Apr-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Make shared struct aka magic page guest endian The shared (magic) page is a data structure that contains often used supervisor privileged SPRs accessible via memory to the user to reduce the number of exits we have to take to read/write them. When we actually share this structure with the guest we have to maintain it in guest endianness, because some of the patch tricks only work with native endian load/store operations. Since we only share the structure with either host or guest in little endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv. For booke, the shared struct stays big endian. For book3s_64 hv we maintain the struct in host native endian, since it never gets shared with the guest. For book3s_64 pr we introduce a variable that tells us which endianness the shared struct is in and route every access to it through helper inline functions that evaluate this variable. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
2743103f |
|
24-Apr-2014 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: PR: Fill pvinfo hcall instructions in big endian We expose a blob of hypercall instructions to user space that it gives to the guest via device tree again. That blob should contain a stream of instructions necessary to do a hypercall in big endian, as it just gets passed into the guest and old guests use them straight away. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
3102f784 |
|
23-May-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/kvm/book3s_hv: Use threads_per_subcore in KVM To support split core on POWER8 we need to modify various parts of the KVM code to use threads_per_subcore instead of threads_per_core. On systems that do not support split core threads_per_subcore == threads_per_core and these changes are a nop. We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT. This communicates to userspace that guests can only be created with a value of threads_per_core that is less than or equal to the current threads_per_subcore. This ensures that guests can only be created with a thread configuration that we are able to run given the current split core mode. Although threads_per_subcore can change during the life of the system, the commit that enables that will ensure that threads_per_subcore does not change during the life of a KVM VM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6c85f52b |
|
09-Jan-2014 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc: IRQ disabling cleanup Simplify the handling of lazy EE by going directly from fully-enabled to hard-disabled. This replaces the lazy_irq_pending() check (including its misplaced kvm_guest_exit() call). As suggested by Tiejun Chen, move the interrupt disabling into kvmppc_prepare_to_enter() rather than have each caller do it. Also move the IRQ enabling on heavyweight exit into kvmppc_prepare_to_enter(). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
73601775 |
|
09-Jan-2014 |
Cédric Le Goater <clg@fr.ibm.com> |
KVM: PPC: Book3S: MMIO emulation support for little endian guests MMIO emulation reads the last instruction executed by the guest and then emulates. If the guest is running in Little Endian order, or more generally in a different endian order of the host, the instruction needs to be byte-swapped before being emulated. This patch adds a helper routine which tests the endian order of the host and the guest in order to decide whether a byteswap is needed or not. It is then used to byteswap the last instruction of the guest in the endian order of the host before MMIO emulation is performed. Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified to reverse the endianness of the MMIO if required. Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [agraf: add booke handling] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
efff1912 |
|
15-Oct-2013 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a78b55d1 |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: book3s: drop is_hv_enabled drop is_hv_enabled, because that should not be a callback property Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
cbbc58d4 |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix coding style] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5587027c |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: Add struct kvm arg to memslot APIs We will use that in the later patch to find the kvm ops handler Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
2ba9f0d8 |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: book3s: Support building HV and PR KVM as module Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
699cc876 |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops This help us to identify whether we are running with hypervisor mode KVM enabled. The change is needed so that we can have both HV and PR kvm enabled in the same kernel. If both HV and PR KVM are included, interrupts come in to the HV version of the kvmppc_interrupt code, which then jumps to the PR handler, renamed to kvmppc_interrupt_pr, if the guest is a PR guest. Allowing both PR and HV in the same kernel required some changes to kvm_dev_ioctl_check_extension(), since the values returned now can't be selected with #ifdefs as much as previously. We look at is_hv_enabled to return the right value when checking for capabilities.For capabilities that are only provided by HV KVM, we return the HV value only if is_hv_enabled is true. For capabilities provided by PR KVM but not HV, we return the PR value only if is_hv_enabled is false. NOTE: in later patch we replace is_hv_enabled with a static inline function comparing kvm_ppc_ops Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
3a167bea |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: Add kvmppc_ops callback This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in booke changes] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
70abaded |
|
30-Aug-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc kvm: use fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e59dbe09 |
|
03-Jul-2013 |
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> |
KVM: Introduce kvm_arch_memslots_updated() This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5f1c248f |
|
10-Jul-2013 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc: Call trace_hardirqs_on before entry Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5975a2e0 |
|
26-Apr-2013 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S: Add API for in-kernel XICS emulation This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
ed840ee9 |
|
26-Apr-2013 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write These functions do an srcu_dereference without acquiring the srcu lock themselves. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
bc5ad3f3 |
|
17-Apr-2013 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller This adds in-kernel emulation of the XICS (eXternal Interrupt Controller Specification) interrupt controller specified by PAPR, for both HV and PR KVM guests. The XICS emulation supports up to 1048560 interrupt sources. Interrupt source numbers below 16 are reserved; 0 is used to mean no interrupt and 2 is used for IPIs. Internally these are represented in blocks of 1024, called ICS (interrupt controller source) entities, but that is not visible to userspace. Each vcpu gets one ICP (interrupt controller presentation) entity, used to store the per-vcpu state such as vcpu priority, pending interrupt state, IPI request, etc. This does not include any API or any way to connect vcpus to their ICP state; that will be added in later patches. This is based on an initial implementation by Michael Ellerman <michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and Paul Mackerras. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix typo, add dependency on !KVM_MPIC] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
8e591cb7 |
|
17-Apr-2013 |
Michael Ellerman <michael@ellerman.id.au> |
KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls For pseries machine emulation, in order to move the interrupt controller code to the kernel, we need to intercept some RTAS calls in the kernel itself. This adds an infrastructure to allow in-kernel handlers to be registered for RTAS services by name. A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to associate token values with those service names. Then, when the guest requests an RTAS service with one of those token values, it will be handled by the relevant in-kernel handler rather than being passed up to userspace as at present. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix warning] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5efdb4be |
|
16-Apr-2013 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: MPIC: Add support for KVM_IRQ_LINE Now that all pieces are in place for reusing generic irq infrastructure, we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply reuse it for PPC, as it will work there just as well. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
eb1e4f43 |
|
12-Apr-2013 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC Enabling this capability connects the vcpu to the designated in-kernel MPIC. Using explicit connections between vcpus and irqchips allows for flexibility, but the main benefit at the moment is that it simplifies the code -- KVM doesn't need vm-global state to remember which MPIC object is associated with this vm, and it doesn't need to care about ordering between irqchip creation and vcpu creation. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5df554ad |
|
12-Apr-2013 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc/mpic: in-kernel MPIC emulation Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
092d62ee |
|
07-Apr-2013 |
Bharat Bhushan <r65777@freescale.com> |
KVM: PPC: debug stub interface parameter defined This patch defines the interface parameter for KVM_SET_GUEST_DEBUG ioctl support. Follow up patches will use this for setting up hardware breakpoints, watchpoints and software breakpoints. Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below. This is because I am not sure what is required for book3s. So this ioctl behaviour will not change for book3s. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
be28a27c |
|
15-Apr-2013 |
Scott Wood <scottwood@freescale.com> |
kvm/ppc: don't call complete_mmio_load when it's a store complete_mmio_load writes back the mmio result into the destination register. Doing this on a store results in register corruption. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
4fe27d2a |
|
14-Feb-2013 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Currently kvmppc_core_dequeue_external() takes a struct kvm_interrupt * argument and does nothing with it, in any of its implementations. This removes it in order to make things easier for forthcoming in-kernel interrupt controller emulation code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
8482644a |
|
27-Feb-2013 |
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> |
KVM: set_memory_region: Refactor commit_memory_region() This patch makes the parameter old a const pointer to the old memory slot and adds a new parameter named change to know the change being requested: the former is for removing extra copying and the latter is for cleaning up the code. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
7b6195a9 |
|
27-Feb-2013 |
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> |
KVM: set_memory_region: Refactor prepare_memory_region() This patch drops the parameter old, a copy of the old memory slot, and adds a new parameter named change to know the change being requested. This not only cleans up the code but also removes extra copying of the memory slot structure. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
462fce46 |
|
27-Feb-2013 |
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> |
KVM: set_memory_region: Drop user_alloc from prepare/commit_memory_region() X86 does not use this any more. The remaining user, s390's !user_alloc check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no longer supported. Note: fixed powerpc's indentations with spaces to suppress checkpatch errors. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
1c810636 |
|
04-Jan-2013 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: BookE: Implement EPR exit The External Proxy Facility in FSL BookE chips allows the interrupt controller to automatically acknowledge an interrupt as soon as a core gets its pending external interrupt delivered. Today, user space implements the interrupt controller, so we need to check on it during such a cycle. This patch implements logic for user space to enable EPR exiting, disable EPR exiting and EPR exiting itself, so that user space can acknowledge an interrupt when an external interrupt has successfully been delivered into the guest vcpu. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5a33169e |
|
14-Dec-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Only WARN on invalid emulation When we hit an emulation result that we didn't expect, that is an error, but it's nothing that warrants a BUG(), because it can be guest triggered. So instead, let's only WARN() the user that this happened. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f82a8cfe |
|
10-Dec-2012 |
Alex Williamson <alex.williamson@redhat.com> |
KVM: struct kvm_memory_slot.user_alloc -> bool There's no need for this to be an int, it holds a boolean. Move to the end of the struct for alignment. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
a2932923 |
|
19-Nov-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
0e673fb6 |
|
08-Oct-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Support eventfd In order to support the generic eventfd infrastructure on PPC, we need to call into the generic KVM in-kernel device mmio code. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
42897d86 |
|
27-Nov-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
5bd1cf11 |
|
22-Aug-2012 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: set IN_GUEST_MODE before checking requests Avoid a race as described in the code comment. Also remove a related smp_wmb() from booke's kvmppc_prepare_to_enter(). I can't see any reason for it, and the book3s_pr version doesn't have it. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a47d72f3 |
|
20-Sep-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Fix updates of vcpu->cpu This removes the powerpc "generic" updates of vcpu->cpu in load and put, and moves them to the various backends. The reason is that "HV" KVM does its own sauce with that field and the generic updates might corrupt it. The field contains the CPU# of the -first- HW CPU of the core always for all the VCPU threads of a core (the one that's online from a host Linux perspective). However, the preempt notifiers are going to be called on the threads VCPUs when they are running (due to them sleeping on our private waitqueue) causing unload to be called, potentially clobbering the value. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
dfe49dbd |
|
11-Sep-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly This adds an implementation of kvm_arch_flush_shadow_memslot for Book3S HV, and arranges for kvmppc_core_commit_memory_region to flush the dirty log when modifying an existing slot. With this, we can handle deletion and modification of memory slots. kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which on Book3S HV now traverses the reverse map chains to remove any HPT (hashed page table) entries referring to pages in the memslot. This gets called by generic code whenever deleting a memslot or changing the guest physical address for a memslot. We flush the dirty log in kvmppc_core_commit_memory_region for consistency with what x86 does. We only need to flush when an existing memslot is being modified, because for a new memslot the rmap array (which stores the dirty bits) is all zero, meaning that every page is considered clean already, and when deleting a memslot we obviously don't care about the dirty bits any more. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a66b48c3 |
|
11-Sep-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Move kvm->arch.slot_phys into memslot.arch Now that we have an architecture-specific field in the kvm_memory_slot structure, we can use it to store the array of page physical addresses that we need for Book3S HV KVM on PPC970 processors. This reduces the size of struct kvm_arch for Book3S HV, and also reduces the size of struct kvm_arch_memory_slot for other PPC KVM variants since the fields in it are now only compiled in for Book3S HV. This necessitates making the kvm_arch_create_memslot and kvm_arch_free_memslot operations specific to each PPC KVM variant. That in turn means that we now don't allocate the rmap arrays on Book3S PR and Book E. Since we now unpin pages and free the slot_phys array in kvmppc_core_free_memslot, we no longer need to do it in kvmppc_core_destroy_vm, since the generic code takes care to free all the memslots when destroying a VM. We now need the new memslot to be passed in to kvmppc_core_prepare_memory_region, since we need to initialize its arch.slot_phys member on Book3S HV. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f61c94bb |
|
08-Aug-2012 |
Bharat Bhushan <r65777@freescale.com> |
KVM: PPC: booke: Add watchdog emulation This patch adds the watchdog emulation in KVM. The watchdog emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl. The kernel timer are used for watchdog emulation and emulates h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how it is configured. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> [bharat.bhushan@freescale.com: reworked patch] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> [agraf: adjust to new request framework] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
7c973a2e |
|
12-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add return value to core_check_requests Requests may want to tell us that we need to go back into host state, so add a return value for the checks. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
7ee78855 |
|
12-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add return value in prepare_to_enter Our prepare_to_enter helper wants to be able to return in more circumstances to the host than only when an interrupt is pending. Broaden the interface a bit and move even more generic code to the generic helper. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
206c2ed7 |
|
12-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Ignore EXITING_GUEST_MODE mode We don't need to do anything when mode is EXITING_GUEST_MODE, because we essentially are outside of guest mode and did everything it asked us to do by the time we check it. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
3766a4c6 |
|
12-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Move kvm_guest_enter call into generic code We need to call kvm_guest_enter in booke and book3s, so move its call to generic code. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
bd2be683 |
|
12-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Book3S: PR: Rework irq disabling Today, we disable preemption while inside guest context, because we need to expose to the world that we are not in a preemptible context. However, during that time we already have interrupts disabled, which would indicate that we are in a non-preemptible context. The reason the checks for irqs_disabled() fail for us though is that we manually control hard IRQs and ignore all the lazy EE framework. Let's stop doing that. Instead, let's always use lazy EE to indicate when we want to disable IRQs, but do a special final switch that gets us into EE disabled, but soft enabled state. That way when we get back out of guest state, we are immediately ready to process interrupts. This simplifies the code drastically and reduces the time that we appear as preempt disabled. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
03d25c5b |
|
09-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use same kvmppc_prepare_to_enter code for booke and book3s_pr We need to do the same things when preparing to enter a guest for booke and book3s_pr cores. Fold the generic code into a generic function that both call. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f4800b1f |
|
07-Aug-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Expose SYNC cap based on mmu notifiers Semantically, the "SYNC" cap means that we have mmu notifiers available. Express this in our #ifdef'ery around the feature, so that we can be sure we don't miss out on ppc targets when they get their implementation. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
9202e076 |
|
02-Jul-2012 |
Liu Yu-B13201 <Yu.Liu@freescale.com> |
KVM: PPC: Add support for ePAPR idle hcall in host kernel And add a new flag definition in kvm_ppc_pvinfo to indicate whether the host supports the EV_IDLE hcall. Signed-off-by: Liu Yu <yu.liu@freescale.com> [stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle] Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> [agraf: fix typo] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
784bafac |
|
02-Jul-2012 |
Stuart Yoder <stuart.yoder@freescale.com> |
KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500 Signed-off-by: Liu Yu <yu.liu@freescale.com> [stuart: factored this out from idle hcall support in host patch] Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
fdcf8bd7 |
|
02-Jul-2012 |
Stuart Yoder <stuart.yoder@freescale.com> |
KVM: PPC: use definitions in epapr header for hcalls Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
2df72e9b |
|
24-Aug-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: split kvm_arch_flush_shadow Introducing kvm_arch_flush_shadow_memslot, to invalidate the translations of a single memory slot. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d89cc617 |
|
01-Aug-2012 |
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> |
KVM: Push rmap into kvm_arch_memory_slot Two reasons: - x86 can integrate rmap and rmap_pde and remove heuristics in __gfn_to_rmap(). - Some architectures do not need rmap. Since rmap is one of the most memory consuming stuff in KVM, ppc'd better restrict the allocation to Book3S HV. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
32fad281 |
|
03-May-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Make the guest hash table size configurable This adds a new ioctl to enable userspace to control the size of the guest hashed page table (HPT) and to clear it out when resetting the guest. The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter a pointer to a u32 containing the desired order of the HPT (log base 2 of the size in bytes), which is updated on successful return to the actual order of the HPT which was allocated. There must be no vcpus running at the time of this ioctl. To enforce this, we now keep a count of the number of vcpus running in kvm->arch.vcpus_running. If the ioctl is called when a HPT has already been allocated, we don't reallocate the HPT but just clear it out. We first clear the kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold the kvm->lock mutex, it will prevent any vcpus from starting to run until we're done, and (b) it means that the first vcpu to run after we're done will re-establish the VRMA if necessary. If userspace doesn't call this ioctl before running the first vcpu, the kernel will allocate a default-sized HPT at that point. We do it then rather than when creating the VM, as the code did previously, so that userspace has a chance to do the ioctl if it wants. When allocating the HPT, we can allocate either from the kernel page allocator, or from the preallocated pool. If userspace is asking for a different size from the preallocated HPTs, we first try to allocate using the kernel page allocator. Then we try to allocate from the preallocated pool, and then if that fails, we try allocating decreasing sizes from the kernel page allocator, down to the minimum size allowed (256kB). Note that the kernel page allocator limits allocations to 1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to 16MB (on 64-bit powerpc, at least). Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix module compilation] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5b74716e |
|
26-Apr-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
kvm/powerpc: Add new ioctl to retreive server MMU infos This is necessary for qemu to be able to pass the right information to the guest, such as the supported page sizes and corresponding encodings in the SLB and hash table, which can vary depending on the processor type, the type of KVM used (PR vs HV) and the version of KVM Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: fix compilation on hv, adjust for newer ioctl numbers] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f31e65e1 |
|
15-Mar-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM There is nothing in the code for emulating TCE tables in the kernel that prevents it from working on "PR" KVM... other than ifdef's and location of the code. This and moves the bulk of the code there to a new file called book3s_64_vio.c. This speeds things up a bit on my G5. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: fix for hv kvm, 32bit, whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
bf7ca4bd |
|
15-Feb-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: rename CONFIG_KVM_E500 -> CONFIG_KVM_E500V2 The CONFIG_KVM_E500 option really indicates that we're running on a V2 machine, not on a machine of the generic E500 class. So indicate that properly and change the config name accordingly. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
73196cd3 |
|
20-Dec-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: e500mc support Add processor support for e500mc, using hardware virtualization support (GS-mode). Current issues include: - No support for external proxy (coreint) interrupt mode in the guest. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d30f6e48 |
|
20-Dec-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: booke: category E.HV (GS-mode) support Chips such as e500mc that implement category E.HV in Power ISA 2.06 provide hardware virtualization features, including a new MSR mode for guest state. The guest OS can perform many operations without trapping into the hypervisor, including transitions to and from guest userspace. Since we can use SRR1[GS] to reliably tell whether an exception came from guest state, instead of messing around with IVPR, we use DO_KVM similarly to book3s. Current issues include: - Machine checks from guest state are not routed to the host handler. - The guest can cause a host oops by executing an emulated instruction in a page that lacks read permission. Existing e500/4xx support has the same problem. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove pt_regs usage] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
043cc4d7 |
|
20-Dec-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: factor out lpid allocator from book3s_64_mmu_hv We'll use it on e500mc as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b6d33834 |
|
08-Mar-2012 |
Christoffer Dall <c.dall@virtualopensystems.com> |
KVM: Factor out kvm_vcpu_kick to arch-generic code The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
db3fe4eb |
|
07-Feb-2012 |
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> |
KVM: Introduce kvm_memory_slot::arch and move lpage_info into it Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b3c5d3c2 |
|
06-Jan-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Rename MMIO register identifiers We need the KVM_REG namespace for generic register settings now, so let's rename the existing users to something different, enabling us to reuse the namespace for more visible interfaces. While at it, also move these private constants to a private header. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
31f3438e |
|
11-Dec-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code This moves the get/set_one_reg implementation down from powerpc.c into booke.c, book3s_pr.c and book3s_hv.c. This avoids #ifdefs in C code, but more importantly, it fixes a bug on Book3s HV where we were accessing beyond the end of the kvm_vcpu struct (via the to_book3s() macro) and corrupting memory, causing random crashes and file corruption. On Book3s HV we only accept setting the HIOR to zero, since the guest runs in supervisor mode and its vectors are never offset from zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> [agraf update to apply on top of changed ONE_REG patches] Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
1022fc3d |
|
14-Sep-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add support for explicit HIOR setting Until now, we always set HIOR based on the PVR, but this is just wrong. Instead, we should be setting HIOR explicitly, so user space can decide what the initial HIOR value is - just like on real hardware. We keep the old PVR based way around for backwards compatibility, but once user space uses the SET_ONE_REG based method, we drop the PVR logic. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
e24ed81f |
|
14-Sep-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add generic single register ioctls Right now we transfer a static struct every time we want to get or set registers. Unfortunately, over time we realize that there are more of these than we thought of before and the extensibility and flexibility of transferring a full struct every time is limited. So this is a new approach to the problem. With these new ioctls, we can get and set a single register that is identified by an ID. This allows for very precise and limited transmittal of data. When we later realize that it's a better idea to shove over multiple registers at once, we can reuse most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS interface. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b5434032 |
|
07-Dec-2011 |
Matt Evans <matt@ozlabs.org> |
KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS PPC KVM lacks these two capabilities, and as such a userland system must assume a max of 4 VCPUs (following api.txt). With these, a userland can determine a more realistic limit. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
03cdab53 |
|
06-Dec-2011 |
Matt Evans <matt@ozlabs.org> |
KVM: PPC: Fix vcpu_create dereference before validity check. Fix usage of vcpu struct before check that it's actually valid. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
342d3db7 |
|
11-Dec-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Implement MMU notifiers for Book3S HV guests This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
4e72dbe1 |
|
11-Dec-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Make wakeups work again for Book3S HV guests When commit f43fdc15fa ("KVM: PPC: booke: Improve timer register emulation") factored out some code in arch/powerpc/kvm/powerpc.c into a new helper function, kvm_vcpu_kick(), an error crept in which causes Book3s HV guest vcpus to stall. This fixes it. On POWER7 machines, guest vcpus are grouped together into virtual CPU cores that share a single waitqueue, so it's important to use vcpu->arch.wqp rather than &vcpu->wq. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ae21216b |
|
09-Dec-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: align vcpu_kick with x86 Our vcpu kick implementation differs a bit from x86 which resulted in us not disabling preemption during the kick. Get it a bit closer to what x86 does. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
dfd4d47e |
|
16-Nov-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: booke: Improve timer register emulation Decrementers are now properly driven by TCR/TSR, and the guest has full read/write access to these registers. The decrementer keeps ticking (and setting the TSR bit) regardless of whether the interrupts are enabled with TCR. The decrementer stops at zero, rather than going negative. Decrementers (and FITs, once implemented) are delivered as level-triggered interrupts -- dequeued when the TSR bit is cleared, not on delivery. Signed-off-by: Liu Yu <yu.liu@freescale.com> [scottwood@freescale.com: significant changes] Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b5904972 |
|
08-Nov-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn This allows additional registers to be accessed by the guest in PR-mode KVM without trapping. SPRG4-7 are readable from userspace. On booke, KVM will sync these registers when it enters the guest, so that accesses from guest userspace will work. The guest kernel, OTOH, must consistently use either the real registers or the shared area between exits. This also applies to the already-paravirted SPRG3. On non-booke, it's not clear to what extent SPRG4-7 are supported (they're not architected for book3s, but exist on at least some classic chips). They are copied in the get/set regs ioctls, but I do not see any non-booke emulation. I also do not see any syncing with real registers (in PR-mode) including the user-readable SPRG3. This patch should not make that situation any worse. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
25051b5a |
|
08-Nov-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: Move prepare_to_enter call site into subarch code This function should be called with interrupts disabled, to avoid a race where an exception is delivered after we check, but the resched kick is received before we disable interrupts (and thus doesn't actually trigger the exit code that would recheck exceptions). booke already does this properly in the lightweight exit case, but not on initial entry. For now, move the call of prepare_to_enter into subarch-specific code so that booke can do the right thing here. Ideally book3s would do the same thing, but I'm having a hard time seeing where it does any interrupt disabling of this sort (plus it has several additional call sites), so I'm deferring the book3s fix to someone more familiar with that code. book3s behavior should be unchanged by this patch. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
7e28e60e |
|
08-Nov-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: Rename deliver_interrupts to prepare_to_enter This function also updates paravirt int_pending, so rename it to be more obvious that this is a collection of checks run prior to (re)entering a guest. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
dc83b8bc |
|
18-Aug-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: e500: MMU API This implements a shared-memory API for giving host userspace access to the guest's TLB. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
5b1c1493 |
|
04-Jan-2012 |
Carsten Otte <cotte@de.ibm.com> |
KVM: s390: ucontrol: export SIE control block to user This patch exports the s390 SIE hardware control block to userspace via the mapping of the vcpu file descriptor. In order to do so, a new arch callback named kvm_arch_vcpu_fault is introduced for all architectures. It allows to map architecture specific pages. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
e08b9637 |
|
04-Jan-2012 |
Carsten Otte <cotte@de.ibm.com> |
KVM: s390: add parameter for KVM_CREATE_VM This patch introduces a new config option for user controlled kernel virtual machines. It introduces a parameter to KVM_CREATE_VM that allows to set bits that alter the capabilities of the newly created virtual machine. The parameter is passed to kvm_arch_init_vm for all architectures. The only valid modifier bit for now is KVM_VM_S390_UCONTROL. This requires CAP_SYS_ADMIN privileges and creates a user controlled virtual machine on s390 architectures. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
bb75c627 |
|
17-Nov-2011 |
Alexander Graf <agraf@suse.de> |
Revert "KVM: PPC: Add support for explicit HIOR setting" This reverts commit a15bd354f083f20f257db450488db52ac27df439. It exceeded the padding on the SREGS struct, rendering the ABI backwards-incompatible. Conflicts: arch/powerpc/kvm/powerpc.c include/linux/kvm.h Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ead53f22 |
|
22-Jul-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: remove non-required uses of include <linux/module.h> None of the files touched here are modules, and they are not exporting any symbols either -- so there is no need to be including the module.h. Builds of all the files remains successful. Even kernel/module.c does not need to include it, since it includes linux/moduleloader.h instead. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
19ccb76a |
|
23-Jul-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per core), whenever a CPU goes idle, we have to pull all the other hardware threads in the core out of the guest, because the H_CEDE hcall is handled in the kernel. This is inefficient. This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall in real mode. When a guest vcpu does an H_CEDE hcall, we now only exit to the kernel if all the other vcpus in the same core are also idle. Otherwise we mark this vcpu as napping, save state that could be lost in nap mode (mainly GPRs and FPRs), and execute the nap instruction. When the thread wakes up, because of a decrementer or external interrupt, we come back in at kvm_start_guest (from the system reset interrupt vector), find the `napping' flag set in the paca, and go to the resume path. This has some other ramifications. First, when starting a core, we now start all the threads, both those that are immediately runnable and those that are idle. This is so that we don't have to pull all the threads out of the guest when an idle thread gets a decrementer interrupt and wants to start running. In fact the idle threads will all start with the H_CEDE hcall returning; being idle they will just do another H_CEDE immediately and go to nap mode. This required some changes to kvmppc_run_core() and kvmppc_run_vcpu(). These functions have been restructured to make them simpler and clearer. We introduce a level of indirection in the wait queue that gets woken when external and decrementer interrupts get generated for a vcpu, so that we can have the 4 vcpus in a vcore using the same wait queue. We need this because the 4 vcpus are being handled by one thread. Secondly, when we need to exit from the guest to the kernel, we now have to generate an IPI for any napping threads, because an HDEC interrupt doesn't wake up a napping thread. Thirdly, we now need to be able to handle virtual external interrupts and decrementer interrupts becoming pending while a thread is napping, and deliver those interrupts to the guest when the thread wakes. This is done in kvmppc_cede_reentry, just before fast_guest_return. Finally, since we are not using the generic kvm_vcpu_block for book3s_hv, and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef from kvm_arch_vcpu_runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
af8f38b3 |
|
10-Aug-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add sanity checking to vcpu_run There are multiple features in PowerPC KVM that can now be enabled depending on the user's wishes. Some of the combinations don't make sense or don't work though. So this patch adds a way to check if the executing environment would actually be able to run the guest properly. It also adds sanity checks if PVR is set (should always be true given the current code flow), if PAPR is only used with book3s_64 where it works and that HV KVM is only used in PAPR mode. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
930b412a |
|
08-Aug-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Enable the PAPR CAP for Book3S Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR mode. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a15bd354 |
|
08-Aug-2011 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add support for explicit HIOR setting Until now, we always set HIOR based on the PVR, but this is just wrong. Instead, we should be setting HIOR explicitly, so user space can decide what the initial HIOR value is - just like on real hardware. We keep the old PVR based way around for backwards compatibility, but once user space uses the SREGS based method, we drop the PVR logic. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
9e368f29 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
aa04b4cc |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
371fefd6 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
54738c09 |
|
28-Jun-2011 |
David Gibson <dwg@au1.ibm.com> |
KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a8606e20 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Handle some PAPR hcalls in the kernel This adds the infrastructure for handling PAPR hcalls in the kernel, either early in the guest exit path while we are still in real mode, or later once the MMU has been turned back on and we are in the full kernel context. The advantage of handling hcalls in real mode if possible is that we avoid two partition switches -- and this will become more important when we support SMT4 guests, since a partition switch means we have to pull all of the threads in the core out of the guest. The disadvantage is that we can only access the kernel linear mapping, not anything vmalloced or ioremapped, since the MMU is off. This also adds code to handle the following hcalls in real mode: H_ENTER Add an HPTE to the hashed page table H_REMOVE Remove an HPTE from the hashed page table H_READ Read HPTEs from the hashed page table H_PROTECT Change the protection bits in an HPTE H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table H_SET_DABR Set the data address breakpoint register Plus code to handle the following hcalls in the kernel: H_CEDE Idle the vcpu until an interrupt or H_PROD hcall arrives H_PROD Wake up a ceded vcpu H_REGISTER_VPA Register a virtual processor area (VPA) The code that runs in real mode has to be in the base kernel, not in the module, if KVM is compiled as a module. The real-mode code can only access the kernel linear mapping, not vmalloc or ioremap space. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
de56a948 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
df6909e5 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Move guest enter/exit down into subarch-specific code Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable() calls in powerpc.c, this moves them down into the subarch-specific book3s_pr.c and booke.c. This eliminates an extra local_irq_enable() call in book3s_pr.c, and will be needed for when we do SMT4 guest support in the book3s hypervisor mode code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f9e0554d |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down This arranges for the top-level arch/powerpc/kvm/powerpc.c file to pass down some of the calls it gets to the lower-level subarchitecture specific code. The lower-level implementations (in booke.c and book3s.c) are no-ops. The coming book3s_hv.c will need this. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
a4cd8b23 |
|
14-Jun-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: e500: enable magic page This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
5ce941ee |
|
27-Apr-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: booke: add sregs support Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
eab17672 |
|
27-Apr-2011 |
Scott Wood <scottwood@freescale.com> |
KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0) Linux doesn't use USPRG0 (now renamed VRSAVE in the architecture, even when Altivec isn't involved), but a guest might. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
09000adb |
|
24-Mar-2011 |
Bharat Bhushan <r65777@freescale.com> |
KVM: PPC: Fix issue clearing exit timing counters Following dump is observed on host when clearing the exit timing counters [root@p1021mds kvm]# echo -n 'c' > vm1200_vcpu0_timing INFO: task echo:1276 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. echo D 0ff5bf94 0 1276 1190 0x00000000 Call Trace: [c2157e40] [c0007908] __switch_to+0x9c/0xc4 [c2157e50] [c040293c] schedule+0x1b4/0x3bc [c2157e90] [c04032dc] __mutex_lock_slowpath+0x74/0xc0 [c2157ec0] [c00369e4] kvmppc_init_timing_stats+0x20/0xb8 [c2157ed0] [c0036b00] kvmppc_exit_timing_write+0x84/0x98 [c2157ef0] [c00b9f90] vfs_write+0xc0/0x16c [c2157f10] [c00ba284] sys_write+0x4c/0x90 [c2157f40] [c000e320] ret_from_syscall+0x0/0x3c The vcpu->mutex is used by kvm_ioctl_* (KVM_RUN etc) and same was used when clearing the stats (in kvmppc_init_timing_stats()). What happens is that when the guest is idle then it held the vcpu->mutx. While the exiting timing process waits for guest to release the vcpu->mutex and a hang state is reached. Now using seprate lock for exit timing stats. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d89f5eff |
|
09-Nov-2010 |
Jan Kiszka <jan.kiszka@siemens.com> |
KVM: Clean up vm creation and release IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d8cdddcd |
|
30-Oct-2010 |
Vasiliy Kulikov <segooon@gmail.com> |
KVM: PPC: fix information leak to userland Structure kvm_ppc_pvinfo is copied to userland with flags and pad fields unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
7b4203e8 |
|
30-Aug-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Expose level based interrupt cap Now that we have all the level interrupt magic in place, let's expose the capability to user space, so it can make use of it! Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
df1bfa25 |
|
02-Aug-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Put segment registers in shared page Now that the actual mtsr doesn't do anything anymore, we can move the sr contents over to the shared page, so a guest can directly read and write its sr contents from guest context. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
7508e16c |
|
03-Aug-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add feature bitmap for magic page We will soon add SR PV support to the shared page, so we need some infrastructure that allows the guest to query for features KVM exports. This patch adds a second return value to the magic mapping that indicated to the guest which features are available. Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
15711e9c |
|
29-Jul-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add get_pvinfo interface to query hypercall instructions We need to tell the guest the opcodes that make up a hypercall through interfaces that are controlled by userspace. So we need to add a call for userspace to allow it to query those opcodes so it can pass them on. This is required because the hypercall opcodes can change based on the hypervisor conditions. If we're running in hardware accelerated hypervisor mode, a hypercall looks different from when we're running without hardware acceleration. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
5fc87407 |
|
29-Jul-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Expose magic page support to guest Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
2a342ed5 |
|
29-Jul-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Implement hypervisor interface To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
666e7252 |
|
29-Jul-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Convert MSR to shared page One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the "interrupts enabled" flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
a1f4d395 |
|
21-Jun-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
69b61833 |
|
11-Jun-2010 |
Denis Kirjanov <dkirjanov@hera.kernel.org> |
KVM: PPC: fix build warning in kvm_arch_vcpu_ioctl_run Fix compile warning: CC [M] arch/powerpc/kvm/powerpc.o arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run': arch/powerpc/kvm/powerpc.c:290: warning: 'gpr' may be used uninitialized in this function arch/powerpc/kvm/powerpc.c:290: note: 'gpr' was declared here Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
93736624 |
|
12-May-2010 |
Avi Kivity <avi@redhat.com> |
KVM: Consolidate arch specific vcpu ioctl locking Now that all arch specific ioctls have centralized locking, it is easy to move it to the central dispatcher. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
19483d14 |
|
12-May-2010 |
Avi Kivity <avi@redhat.com> |
KVM: PPC: Centralize locking of arch specific vcpu ioctls Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
2122ff5e |
|
13-May-2010 |
Avi Kivity <avi@redhat.com> |
KVM: move vcpu locking to dispatcher for generic vcpu ioctls All vcpu ioctls need to be locked, so instead of locking each one specifically we lock at the generic dispatcher. This patch only updates generic ioctls and leaves arch specific ioctls alone. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
c7f38f46 |
|
15-Apr-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Improve indirect svcpu accessors We already have some inline fuctions we use to access vcpu or svcpu structs, depending on whether we're on booke or book3s. Since we just put a few more registers into the svcpu, we also need to make sure the respective callbacks are available and get used. So this patch moves direct use of the now in the svcpu struct fields to inline function calls. While at it, it also moves the definition of those inline function calls to respective header files for booke and book3s, greatly improving readability. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
287d5611 |
|
01-Apr-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Only use QPRs when available BookE KVM doesn't know about QPRs, so let's not try to access then. This fixes a build error on BookE KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ad0a048b |
|
24-Mar-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add OSI hypercall interface MOL uses its own hypercall interface to call back into userspace when the guest wants to do something. So let's implement that as an exit reason, specify it with a CAP and only really use it when userspace wants us to. The only user of it so far is MOL. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
71fbfd5f |
|
24-Mar-2010 |
Alexander Graf <agraf@suse.de> |
KVM: Add support for enabling capabilities per-vcpu Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
18978768 |
|
24-Mar-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Allow userspace to unset the IRQ line Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. Signed-off-by: Alexander Graf <agraf@suse.de> v2 -> v3: - Add CAP for unset irq Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
06056bfb |
|
08-Mar-2010 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
KVM: PPC: Do not create debugfs if fail to create vcpu If fail to create the vcpu, we should not create the debugfs for it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Alexander Graf <agraf@suse.de> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
a595405d |
|
22-Feb-2010 |
Alexander Graf <alex@csgraf.de> |
KVM: PPC: Destory timer on vcpu destruction When we destory a vcpu, we should also make sure to kill all pending timers that could still be up. When not doing this, hrtimers might dereference null pointers trying to call our code. This patch fixes spontanious kernel panics seen after closing VMs. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
c10207fe |
|
19-Feb-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add capability for paired singles We need to tell userspace that we can emulate paired single instructions. So let's add a capability export. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
3587d534 |
|
19-Feb-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Teach MMIO Signedness The guest I was trying to get to run uses the LHA and LHAU instructions. Those instructions basically do a load, but also sign extend the result. Since we need to fill our registers by hand when doing MMIO, we also need to sign extend manually. This patch implements sign extended MMIO and the LHA(U) instructions. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b104d066 |
|
19-Feb-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Enable MMIO to do 64 bits, fprs and qprs Right now MMIO access can only happen for GPRs and is at most 32 bit wide. That's actually enough for almost all types of hardware out there. Unfortunately, the guest I was using used FPU writes to MMIO regions, so it ended up writing 64 bit MMIOs using FPRs and QPRs. So let's add code to handle those odd cases too. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
64749204 |
|
18-Jan-2010 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: fix cleanup_srcu_struct on vm destruction cleanup_srcu_struct on VM destruction remains broken: BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 RIP: 0010:[<ffffffff802533d2>] [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 Call Trace: [<ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm] [<ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm] [<ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel] [<ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm] [<ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm] Move it to kvm_arch_destroy_vm. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
|
#
8e5b26b5 |
|
07-Jan-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use accessor functions for GPR access All code in PPC KVM currently accesses gprs in the vcpu struct directly. While there's nothing wrong with that wrt the current way gprs are stored and loaded, it doesn't suffice for the PACA acceleration that will follow in this patchset. So let's just create little wrapper inline functions that we call whenever a GPR needs to be read from or written to. The compiled code shouldn't really change at all for now. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
f7784b8e |
|
23-Dec-2009 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: split kvm_arch_set_memory_region into prepare and commit Required for SRCU convertion later. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
e15a1137 |
|
29-Nov-2009 |
Alexander Graf <agraf@suse.de> |
powerpc/kvm: Sync guest visible MMU state Currently userspace has no chance to find out which virtual address space we're in and resolve addresses. While that is a big problem for migration, it's also unpleasent when debugging, as gdb and the monitor don't work on virtual addresses. This patch exports enough of the MMU segment state to userspace to make debugging work and thus also includes the groundwork for migration. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
10474ae8 |
|
15-Sep-2009 |
Alexander Graf <agraf@suse.de> |
KVM: Activate Virtualization On Demand X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
367e1319 |
|
26-Aug-2009 |
Avi Kivity <avi@redhat.com> |
KVM: Return -ENOTTY on unrecognized ioctls Not the incorrect -EINVAL. Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
544c6761 |
|
01-Nov-2009 |
Alexander Graf <agraf@suse.de> |
Use hrtimers for the decrementer Following S390's good example we should use hrtimers for the decrementer too! This patch converts the timer from the old mechanism to hrtimers. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4e755758 |
|
29-Oct-2009 |
Alexander Graf <agraf@suse.de> |
Move dirty logging code to sub-arch PowerPC code handles dirty logging in the generic parts atm. While this is great for "return -ENOTSUPP", we need to be rather target specific when actually implementing it. So let's split it to implementation specific code, so we can implement it for book3s. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
a1b37100 |
|
09-Jul-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Reduce runnability interface with arch support code Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
46f43c6e |
|
18-Jun-2009 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: powerpc: convert marker probes to event trace [avi: make it build] [avi: fold trace-arch.h into trace.h] CC: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
988a2cae |
|
09-Jun-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Use macro to iterate over vcpus. [christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
78646121 |
|
22-Mar-2009 |
Gleb Natapov <gleb@redhat.com> |
KVM: Fix interrupt unhalting a vcpu when it shouldn't kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
f5d0906b |
|
04-Jan-2009 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: remove debug support broken by KVM debug rewrite After the rewrite of KVM's debug support, this code doesn't even build any more. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ecc0981f |
|
03-Jan-2009 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: cosmetic changes to mmu hook names Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d0bfb940 |
|
15-Dec-2008 |
Jan Kiszka <jan.kiszka@siemens.com> |
KVM: New guest debug interface This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ad8ba2cd |
|
05-Jan-2009 |
Sheng Yang <sheng@linux.intel.com> |
KVM: Add kvm_arch_sync_events to sync with asynchronize events kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
ca9edaee |
|
08-Dec-2008 |
Avi Kivity <avi@redhat.com> |
KVM: Consolidate userspace memory capability reporting into common code Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
73e75b41 |
|
02-Dec-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: Implement in-kernel exit timing statistics Existing KVM statistics are either just counters (kvm_stat) reported for KVM generally or trace based aproaches like kvm_trace. For KVM on powerpc we had the need to track the timings of the different exit types. While this could be achieved parsing data created with a kvm_trace extension this adds too much overhead (at least on embedded PowerPC) slowing down the workloads we wanted to measure. Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm code. These statistic is available per vm&vcpu under the kvm debugfs directory. As this statistic is low, but still some overhead it can be enabled via a .config entry and should be off by default. Since this patch touched all powerpc kvm_stat code anyway this code is now merged and simplified together with the exit timing statistic code (still working with exit timing disabled in .config). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
5cf8ca22 |
|
05-Nov-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: adjust vcpu types to support 64-bit cores However, some of these fields could be split into separate per-core structures in the future. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
db93f574 |
|
05-Nov-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor This patch doesn't yet move all 44x-specific data into the new structure, but is the first step down that path. In the future we may also want to create a struct kvm_vcpu_booke. Based on patch from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
9dd921cf |
|
05-Nov-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: Refactor powerpc.c to relocate 440-specific code This introduces a set of core-provided hooks. For 440, some of these are implemented by booke.c, with the rest in (the new) 44x.c. Note that these hooks are link-time, not run-time. Since it is not possible to build a single kernel for both e500 and 440 (for example), using function pointers would only add overhead. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
fad7b9b5 |
|
22-Dec-2008 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix KVM build on ppc440 Commit 2a4aca1144394653269720ffbb5a325a77abd5fa ("powerpc/mm: Split low level tlb invalidate for nohash processors") changed a call to _tlbia to _tlbil_all but didn't include the header that defines _tlbil_all, leading to a build failure on 440 if KVM is enabled. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
2a4aca11 |
|
18-Dec-2008 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c30f8a6c |
|
24-Nov-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: stop leaking host memory on VM exit When the VM exits, we must call put_page() for every page referenced in the shadow TLB. Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB pages). The maximum number of pages leaked is the size of our shadow TLB, 64 pages. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
83aae4a8 |
|
25-Jul-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: Write only modified shadow entries into the TLB on exit Track which TLB entries need to be written, instead of overwriting everything below the high water mark. Typically only a single guest TLB entry will be modified in a single exit. Guest boot time performance improvement: about 15%. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
6a0ab738 |
|
25-Jul-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: guest breakpoint support Allow host userspace to program hardware debug registers to set breakpoints inside guests. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
34d4cb8f |
|
10-Jul-2008 |
Marcelo Tosatti <mtosatti@redhat.com> |
KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
588968b6 |
|
30-May-2008 |
Laurent Vivier <Laurent.Vivier@bull.net> |
KVM: Add coalesced MMIO support (powerpc part) This patch enables coalesced MMIO for powerpc architecture. It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO. It enables the compilation of coalesced_mmio.c. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
7cc88830 |
|
13-May-2008 |
Avi Kivity <avi@qumranet.com> |
KVM: Remove decache_vcpus_on_cpu() and related callbacks Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
45c5eb67 |
|
25-Apr-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: Handle guest idle by emulating MSR[WE] writes This reduces host CPU usage when the guest is idle. However, the guest must set MSR[WE] in its idle loop, which Linux did not do until 2.6.26. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
bbf45ba5 |
|
16-Apr-2008 |
Hollis Blanchard <hollisb@us.ibm.com> |
KVM: ppc: PowerPC 440 KVM implementation This functionality is definitely experimental, but is capable of running unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only tested with 440EP "Bamboo" guests so far, but with appropriate userspace support other SoC/board combinations should work.) See Documentation/powerpc/kvm_440.txt for technical details. [stephen: build fix] Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
|