#
6de2e837 |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Book3S HV: Introduce low level MSR accessor kvmppc_get_msr() and kvmppc_set_msr_fast() serve as accessors for the MSR. However because the MSR is kept in the shared regs they include a conditional check for kvmppc_shared_big_endian() and endian conversion. Within the Book3S HV specific code there are direct reads and writes of shregs::msr. In preparation for Nested APIv2 these accesses need to be replaced with accessor functions so it is possible to extend their behavior. However, using the kvmppc_get_msr() and kvmppc_set_msr_fast() functions is undesirable because it would introduce a conditional branch and endian conversion that is not currently present. kvmppc_set_msr_hv() already exists, it is used for the kvmppc_ops::set_msr callback. Introduce a low level accessor __kvmppc_{s,g}et_msr_hv() that simply gets and sets shregs::msr. This will be extend for Nested APIv2 support. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-8-jniethe5@gmail.com
|
#
0e85b7df |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Always use the GPR accessors Always use the GPR accessor functions. This will be important later for Nested APIv2 support which requires additional functionality for accessing and modifying VCPU state. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-2-jniethe5@gmail.com
|
#
86dacd96 |
|
09-May-2023 |
Rohan McLure <rmclure@linux.ibm.com> |
powerpc: Mark writes registering ipi to host cpu through kvm and polling Mark writes to hypervisor ipi state so that KCSAN recognises these asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with atomic writes. Mark asynchronous polls to this variable in kvm_ppc_read_one_intr(). Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-9-rmclure@linux.ibm.com
|
#
7ef3d06f |
|
27-Jul-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
powerpc/powernv/kvm: Use darn for H_RANDOM on Power9 The existing logic in KVM to support guests calling H_RANDOM only works on Power8, because it looks for an RNG in the device tree, but on Power9 we just use darn. In addition the existing code needs to work in real mode, so we have the special cased powernv_get_random_real_mode() to deal with that. Instead just have KVM call ppc_md.get_random_seed(), and do the real mode check inside of there, that way we use whatever RNG is available, including darn on Power9. Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> [mpe: Rebase on previous commit, update change log appropriately] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220727143219.2684192-2-mpe@ellerman.id.au
|
#
b8c7ee79 |
|
11-Jul-2022 |
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> |
KVM: PPC: Book3s HV: Remove unused function kvmppc_bad_interrupt The commit fae5c9f3664b ("KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path") removed the last reference to the function. Fixes: fae5c9f3664b ("KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path") Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220711223617.63625-3-muriloo@linux.ibm.com
|
#
b22af904 |
|
09-May-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers Currently we have 2 sets of interrupt controller hypercalls handlers for real and virtual modes, this is from POWER8 times when switching MMU on was considered an expensive operation. POWER9 however does not have dependent threads and MMU is enabled for handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers never execute on real P9 and later CPUs. This untemplate the handlers and only keeps the real mode handlers for XICS native (up to POWER8) and remove the rest of dead code. Changes in functions are mechanical except few missing empty lines to make checkpatch.pl happy. The default implemented hcalls list already contains XICS hcalls so no change there. This should not cause any behavioral change. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
|
#
76222808 |
|
04-Mar-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Move C prototypes out of asm-prototypes.h We originally added asm-prototypes.h in commit 42f5b4cacd78 ("powerpc: Introduce asm-prototypes.h"). It's purpose was for prototypes of C functions that are only called from asm, in order to fix sparse warnings about missing prototypes. A few months later Nick added a different use case in commit 4efca4ed05cb ("kbuild: modversions for EXPORT_SYMBOL() for asm") for C prototypes for exported asm functions. This is basically the inverse of our original usage. Since then we've added various prototypes to asm-prototypes.h for both reasons, meaning we now need to unstitch it all. Dispatch prototypes of C functions into relevant headers and keep only the prototypes for functions defined in assembly. For the time being, leave prom_init() there because moving it into asm/prom.h or asm/setup.h conflicts with drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowrom.o This will be fixed later by untaggling asm/pci.h and asm/prom.h or by renaming the function in shadowrom.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62d46904eca74042097acf4cb12c175e3067f3d1.1646413435.git.christophe.leroy@csgroup.eu
|
#
6398326b |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Stop using vc->dpdes The P9 path uses vc->dpdes only for msgsndp / SMT emulation. This adds an ordering requirement between vcpu->doorbell_request and vc->dpdes for no real benefit. Use vcpu->doorbell_request directly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-53-npiggin@gmail.com
|
#
0ba0e5d5 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Split P8 from P9 path guest vCPU TLB flushing This creates separate functions for old and new paths for vCPU TLB flushing, which will reduce complexity of the next change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-43-npiggin@gmail.com
|
#
cf0b0e37 |
|
18-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a reserved value on POWER7/8. On POWER8 this invalidates the SLB entries above index 0, similarly to SLBIA IH=0. If the SLB entries are invalidated, and then the guest is bypassed, the host SLB does not get re-loaded, so the bolted entries above 0 will be lost. This can result in kernel stack access causing a SLB fault. Kernel stack access causing a SLB fault was responsible for the infamous mega bug (search "Fix SLB reload bug"). Although since commit 48e7b7695745 ("powerpc/64s/hash: Convert SLB miss handlers to C") that starts using the kernel stack in the SLB miss handler, it might only result in an infinite loop of SLB faults. In any case it's a bug. Fix this by only executing the instruction on >= POWER9 where IH=7 is defined not to invalidate the SLB. POWER7/8 don't require this ERAT flush. Fixes: 500871125920 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
|
#
5ae36401 |
|
03-Aug-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
powerpc: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de
|
#
77bbbc0c |
|
01-Jun-2021 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors The POWER9 vCPU TLB management code assumes all threads in a core share a TLB, and that TLBIEL execued by one thread will invalidate TLBs for all threads. This is not the case for SMT8 capable POWER9 and POWER10 (big core) processors, where the TLB is split between groups of threads. This results in TLB multi-hits, random data corruption, etc. Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine which siblings share TLBs, and use that in the guest TLB flushing code. [npiggin@gmail.com: add changelog and comment] Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com
|
#
2ce008c8 |
|
28-May-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Remove unused nested HV tests in XICS emulation Commit f3c18e9342a44 ("KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor") added nested HV tests in XICS hypercalls, but not all are required. * icp_eoi is only called by kvmppc_deliver_irq_passthru which is only called by kvmppc_check_passthru which is only caled by kvmppc_read_one_intr. * kvmppc_read_one_intr is only called by kvmppc_read_intr which is only called by the L0 HV rmhandlers code. * kvmhv_rm_send_ipi is called by: - kvmhv_interrupt_vcore which is only called by kvmhv_commence_exit which is only called by the L0 HV rmhandlers code. - icp_send_hcore_msg which is only called by icp_rm_set_vcpu_irq. - icp_rm_set_vcpu_irq which is only called by icp_rm_try_update - icp_rm_set_vcpu_irq is not nested HV safe because it writes to LPCR directly without a kvmhv_on_pseries test. Nested handlers should not in general be using the rm handlers. The important test seems to be in kvmppc_ipi_thread, which sends the virt-mode H_IPI handler kick to use smp_call_function rather than msgsnd. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-26-npiggin@gmail.com
|
#
dcbac73a |
|
28-May-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers Now that the P7/8 path no longer supports radix, real-mode handlers do not need to deal with being called in virt mode. This change effectively reverts commit acde25726bc6 ("KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers"). It removes a few more real-mode tests in rm hcall handlers, which allows the indirect ops for the xive module to be removed from the built-in xics rm handlers. kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more descriptive and consistent with other rm handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com
|
#
732f21a3 |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR Rather than clear the HV bit from the MSR at guest entry, make it clear that the hypervisor does not allow the guest to set the bit. The HV clear is kept in guest entry for now, but a future patch will warn if it is set. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-13-npiggin@gmail.com
|
#
946cf44a |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR Rather than add the ME bit to the MSR at guest entry, make it clear that the hypervisor does not allow the guest to clear the bit. The ME set is kept in guest entry for now, but a future patch will warn if it's not present. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-12-npiggin@gmail.com
|
#
3a96570f |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: convert interrupt handlers to use wrappers Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
|
#
b1b1697a |
|
17-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
e8063940 |
|
06-Oct-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/mm: Update tlbiel loop on POWER10 With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
|
#
04ba0a92 |
|
13-Oct-2020 |
Mike Rapoport <rppt@kernel.org> |
KVM: PPC: Book3S HV: simplify kvm_cma_reserve() Patch series "memblock: seasonal cleaning^w cleanup", v3. These patches simplify several uses of memblock iterators and hide some of the memblock implementation details from the rest of the system. This patch (of 17): The memory size calculation in kvm_cma_reserve() traverses memblock.memory rather than simply call memblock_phys_mem_size(). The comment in that function suggests that at some point there should have been call to memblock_analyze() before memblock_phys_mem_size() could be used. As of now, there is no memblock_analyze() at all and memblock_phys_mem_size() can be used as soon as cold-plug memory is registered with memblock. Replace loop over memblock.memory with a call to memblock_phys_mem_size(). Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lkml.kernel.org/r/20200818151634.14343-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20200818151634.14343-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a5a8b258 |
|
13-Jul-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/kvm/cma: Improve kernel log during boot Current kernel gives: [ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000 [ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node [ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000 With the fix [ 0.000000] kvm_cma_reserve: reserving 26214 MiB for global area [ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000 [ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node [ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713150749.25245-2-aneesh.kumar@linux.ibm.com
|
#
6a13cb0c |
|
02-Oct-2019 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Implement LPCR[AIL]=3 mode for injected interrupts kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which can result in the guest receiving interrupts as if LPCR[AIL]=0 contrary to the ISA. In practice, Linux guests cope with this deviation, but it should be fixed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
268f4ef9 |
|
02-Oct-2019 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest delivery This consolidates the HV interrupt delivery logic into one place. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
6c46fcce |
|
23-Jun-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/radix: keep kernel ERAT over local process/guest invalidates ISA v3.0 radix modes provide SLBIA variants which can invalidate ERAT for effPID!=0 or for effLPID!=0, which allows user and guest invalidations to retain kernel/host ERAT entries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
fe7946ce |
|
23-Jun-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Rename PPC_INVALIDATE_ERAT to PPC_ISA_3_0_INVALIDATE_ERAT This makes it clear to the caller that it can only be used on POWER9 and later CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use "ISA_3_0" rather than "ARCH_300"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
50087112 |
|
19-Jun-2019 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries When a guest vcpu moves from one physical thread to another it is necessary for the host to perform a tlb flush on the previous core if another vcpu from the same guest is going to run there. This is because the guest may use the local form of the tlb invalidation instruction meaning stale tlb entries would persist where it previously ran. This is handled on guest entry in kvmppc_check_need_tlb_flush() which calls flush_guest_tlb() to perform the tlb flush. Previously the generic radix__local_flush_tlb_lpid_guest() function was used, however the functionality was reimplemented in flush_guest_tlb() to avoid the trace_tlbie() call as the flushing may be done in real mode. The reimplementation in flush_guest_tlb() was missing an erat invalidation after flushing the tlb. This lead to observable memory corruption in the guest due to the caching of stale translations. Fix this by adding the erat invalidation. Fixes: 70ea13f6e609 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d2912cb1 |
|
04-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
70ea13f6 |
|
29-Apr-2019 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Flush TLB on secondary radix threads When running on POWER9 with kvm_hv.indep_threads_mode = N and the host in SMT1 mode, KVM will run guest VCPUs on offline secondary threads. If those guests are in radix mode, we fail to load the LPID and flush the TLB if necessary, leading to the guest crashing with an unsupported MMU fault. This arises from commit 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on", 2018-05-17), which didn't consider the case where indep_threads_mode = N. For simplicity, this makes the real-mode guest entry path flush the TLB in the same place for both radix and hash guests, as we did before 9a4506e11b97, though the code is now C code rather than assembly code. We also have the radix TLB flush open-coded rather than calling radix__local_flush_tlb_lpid_guest(), because the TLB flush can be called in real mode, and in real mode we don't want to invoke the tracepoint code. Fixes: 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2940ba0c |
|
29-Apr-2019 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C code This replaces assembler code in book3s_hv_rmhandlers.S that checks the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush with C code in book3s_hv_builtin.c. Note that unlike the radix version, the hash version doesn't do an explicit ERAT invalidation because we will invalidate and load up the SLB before entering the guest, and that will invalidate the ERAT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
03f95332 |
|
04-Feb-2019 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE Currently, the KVM code assumes that if the host kernel is using the XIVE interrupt controller (the new interrupt controller that first appeared in POWER9 systems), then the in-kernel XICS emulation will use the XIVE hardware to deliver interrupts to the guest. However, this only works when the host is running in hypervisor mode and has full access to all of the XIVE functionality. It doesn't work in any nested virtualization scenario, either with PR KVM or nested-HV KVM, because the XICS-on-XIVE code calls directly into the native-XIVE routines, which are not initialized and cannot function correctly because they use OPAL calls, and OPAL is not available in a guest. This means that using the in-kernel XICS emulation in a nested hypervisor that is using XIVE as its interrupt controller will cause a (nested) host kernel crash. To fix this, we change most of the places where the current code calls xive_enabled() to select between the XICS-on-XIVE emulation and the plain XICS emulation to call a new function, xics_on_xive(), which returns false in a guest. However, there is a further twist. The plain XICS emulation has some functions which are used in real mode and access the underlying XICS controller (the interrupt controller of the host) directly. In the case of a nested hypervisor, this means doing XICS hypercalls directly. When the nested host is using XIVE as its interrupt controller, these hypercalls will fail. Therefore this also adds checks in the places where the XICS emulation wants to access the underlying interrupt controller directly, and if that is XIVE, makes the code use the virtual mode fallback paths, which call generic kernel infrastructure rather than doing direct XICS access. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
f3c18e93 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor This adds code to call the H_IPI and H_EOI hypercalls when we are running as a nested hypervisor (i.e. without the CPU_FTR_HVMODE cpu feature) and we would otherwise access the XICS interrupt controller directly or via an OPAL call. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f7035ce9 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code This is based on a patch by Suraj Jitindar Singh. This moves the code in book3s_hv_rmhandlers.S that generates an external, decrementer or privileged doorbell interrupt just before entering the guest to C code in book3s_hv_builtin.c. This is to make future maintenance and modification easier. The algorithm expressed in the C code is almost identical to the previous algorithm. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
65182029 |
|
17-Aug-2018 |
Marek Szyprowski <m.szyprowski@samsung.com> |
mm/cma: remove unsupported gfp_mask parameter from cma_alloc() cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7c1bd80c |
|
17-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Send kvmppc_bad_interrupt NMIs to Linux handlers It's possible to take a SRESET or MCE in these paths due to a bug in the host code or a NMI IPI, etc. A recent bug attempting to load a virtual address from real mode gave th complete but cryptic error, abridged: Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 Sending the NMIs through the Linux handlers gives a nicer output: Severe Machine check interrupt [Not recovered] NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 Initiator: CPU Error type: Real address [Load (bad)] Effective address: d00017fffcc01a28 opal: Machine check interrupt unrecoverable: MSR(RI=0) opal: Hardware platform error: Unrecoverable Machine Check exception CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M NIP: c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580 REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: 00000000 CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
1143a706 |
|
07-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Add pt_regs into kvm_vcpu_arch and move vcpu->arch.gpr[] into it Current regs are scattered at kvm_vcpu_arch structure and it will be more neat to organize them into pt_regs structure. Also it will enable reimplementation of MMIO emulation code with analyse_instr() later. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
d2e60075 |
|
13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Use array of paca pointers and allocate pacas individually Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c0101509 |
|
18-Oct-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts This patch removes the restriction that a radix host can only run radix guests, allowing us to run HPT (hashed page table) guests as well. This is useful because it provides a way to run old guest kernels that know about POWER8 but not POWER9. Unfortunately, POWER9 currently has a restriction that all threads in a given code must either all be in HPT mode, or all in radix mode. This means that when entering a HPT guest, we have to obtain control of all 4 threads in the core and get them to switch their LPIDR and LPCR registers, even if they are not going to run a guest. On guest exit we also have to get all threads to switch LPIDR and LPCR back to host values. To make this feasible, we require that KVM not be in the "independent threads" mode, and that the CPU cores be in single-threaded mode from the host kernel's perspective (only thread 0 online; threads 1, 2 and 3 offline). That allows us to use the same code as on POWER8 for obtaining control of the secondary threads. To manage the LPCR/LPIDR changes required, we extend the kvm_split_info struct to contain the information needed by the secondary threads. All threads perform a barrier synchronization (where all threads wait for every other thread to reach the synchronization point) on guest entry, both before and after loading LPCR and LPIDR. On guest exit, they all once again perform a barrier synchronization both before and after loading host values into LPCR and LPIDR. Finally, it is also currently necessary to flush the entire TLB every time we enter a HPT guest on a radix host. We do this on thread 0 with a loop of tlbiel instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
00bb6ae5 |
|
26-Oct-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled When running a guest on a POWER9 system with the in-kernel XICS emulation disabled (for example by running QEMU with the parameter "-machine pseries,kernel_irqchip=off"), the kernel does not pass the XICS-related hypercalls such as H_CPPR up to userspace for emulation there as it should. The reason for this is that the real-mode handlers for these hypercalls don't check whether a XICS device has been instantiated before calling the xics-on-xive code. That code doesn't check either, leading to potential NULL pointer dereferences because vcpu->arch.xive_vcpu is NULL. Those dereferences won't cause an exception in real mode but will lead to kernel memory corruption. This fixes it by adding kvmppc_xics_enabled() checks before calling the XICS functions. Cc: stable@vger.kernel.org # v4.11+ Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
857b99e1 |
|
01-Sep-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Handle unexpected interrupts better At present, if an interrupt (i.e. an exception or trap) occurs in the code where KVM is switching the MMU to or from guest context, we jump to kvmppc_bad_host_intr, where we simply spin with interrupts disabled. In this situation, it is hard to debug what happened because we get no indication as to which interrupt occurred or where. Typically we get a cascade of stall and soft lockup warnings from other CPUs. In order to get more information for debugging, this adds code to create a stack frame on the emergency stack and save register values to it. We start half-way down the emergency stack in order to give ourselves some chance of being able to do a stack trace on secondary threads that are already on the emergency stack. On POWER7 or POWER8, we then just spin, as before, because we don't know what state the MMU context is in or what other threads are doing, and we can't switch back to host context without coordinating with other threads. On POWER9 we can do better; there we load up the host MMU context and jump to C code, which prints an oops message to the console and panics. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
898b25b2 |
|
21-Jun-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code Since commit b009031f74da ("KVM: PPC: Book3S HV: Take out virtual core piggybacking code", 2016-09-15), we only have at most one vcore per subcore. Previously, the fact that there might be more than one vcore per subcore meant that we had the notion of a "master vcore", which was the vcore that controlled thread 0 of the subcore. We also needed a list per subcore in the core_info struct to record which vcores belonged to each subcore. Now that there can only be one vcore in the subcore, we can replace the list with a simple pointer and get rid of the notion of the master vcore (and in fact treat every vcore as a master vcore). We can also get rid of the subcore_vm[] field in the core_info struct since it is never read. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
acde2572 |
|
10-May-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers POWER9 running a radix guest will take some hypervisor interrupts without going to real mode (turning off the MMU). This means that early hypercall handlers may now be called in virtual mode. Most of the handlers work just fine in both modes, but there are some that can crash the host if called in virtual mode, notably the TCE (IOMMU) hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT. These already have both a real-mode and a virtual-mode version, so we arrange for the real-mode version to return H_TOO_HARD for radix guests, which will result in the virtual-mode version being called. The other hypercall which is sensitive to the MMU mode is H_RANDOM. It doesn't have a virtual-mode version, so this adds code to enable it to be called in either mode. An alternative solution was considered which would refuse to call any of the early hypercall handlers when doing a virtual-mode exit from a radix guest. However, the XICS-on-XIVE code depends on the XICS hypercalls being handled early even for virtual-mode exits, because the handlers need to be called before the XIVE vCPU state has been pulled off the hardware. Therefore that solution would have become quite invasive and complicated, and was rejected in favour of the simpler, though less elegant, solution presented here. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
5af50993 |
|
05-Apr-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f318dd08 |
|
18-Apr-2017 |
Laura Abbott <labbott@redhat.com> |
cma: Store a name in the cma structure Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it useful to have an explicit name attached to each region. Store the name in each CMA structure. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d381d7ca |
|
05-Apr-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Consolidate variants of real-mode MMIOs We have all sort of variants of MMIO accessors for the real mode instructions. This creates a clean set of accessors based on Linux normal naming conventions, replacing all occurrences of the old ones in the tree. I have purposefully removed the "out/in" variants in favor of only including __raw variants. Any code using these is already pretty much hand tuned to operate in a very specific environment. I've fixed up the 2 users (only one of them actually needed a barrier in the first place). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
243e2511 |
|
05-Apr-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/xive: Native exploitation of the XIVE interrupt controller The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e2f466e3 |
|
24-Feb-2017 |
Lucas Stach <l.stach@pengutronix.de> |
mm: cma_alloc: allow to specify GFP mask Most users of this interface just want to use it with the default GFP_KERNEL flags, but for cases where DMA memory is allocated it may be called from a different context. No functional change yet, just passing through the flag to the underlying alloc_contig_range function. Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ab9bad0e |
|
06-Feb-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/powernv: Remove separate entry for OPAL real mode calls All entry points already read the MSR so they can easily do the right thing. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
db9a290d |
|
19-Dec-2016 |
David Gibson <david@gibson.dropbear.id.au> |
KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate it with CMA only. To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma(). Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
53af3ba2 |
|
30-Jan-2017 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Allow guest exit path to have MMU on If we allow LPCR[AIL] to be set for radix guests, then interrupts from the guest to the host can be delivered by the hardware with relocation on, and thus the code path starting at kvmppc_interrupt_hv can be executed in virtual mode (MMU on) for radix guests (previously it was only ever executed in real mode). Most of the code is indifferent to whether the MMU is on or off, but the calls to OPAL that use the real-mode OPAL entry code need to be switched to use the virtual-mode code instead. The affected calls are the calls to the OPAL XICS emulation functions in kvmppc_read_one_intr() and related functions. We test the MSR[IR] bit to detect whether we are in real or virtual mode, and call the opal_rm_* or opal_* function as appropriate. The other place that depends on the MMU being off is the optimization where the guest exit code jumps to the external interrupt vector or hypervisor doorbell interrupt vector, or returns to its caller (which is __kvmppc_vcore_entry). If the MMU is on and we are returning to the caller, then we don't need to use an rfid instruction since the MMU is already on; a simple blr suffices. If there is an external or hypervisor doorbell interrupt to handle, we branch to the relocation-on version of the interrupt vector. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e34af784 |
|
30-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h This moves the prototypes for functions that are only called from assembler code out of asm/asm-prototypes.h into asm/kvm_ppc.h. The prototypes were added in commit ebe4535fbe7a ("KVM: PPC: Book3S HV: sparse: prototypes for functions called from assembler", 2016-10-10), but given that the functions are KVM functions, having them in a KVM header will be better for long-term maintenance. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
e2702871 |
|
23-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Fix compilation with unusual configurations This adds the "again" parameter to the dummy version of kvmppc_check_passthru(), so that it matches the real version. This fixes compilation with CONFIG_BOOK3S_64_HV set but CONFIG_KVM_XICS=n. This includes asm/smp.h in book3s_hv_builtin.c to fix compilation with CONFIG_SMP=n. The explicit inclusion is necessary to provide definitions of hard_smp_processor_id() and get_hard_smp_processor_id() in UP configs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
f725758b |
|
17-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 POWER9 includes a new interrupt controller, called XIVE, which is quite different from the XICS interrupt controller on POWER7 and POWER8 machines. KVM-HV accesses the XICS directly in several places in order to send and clear IPIs and handle interrupts from PCI devices being passed through to the guest. In order to make the transition to XIVE easier, OPAL firmware will include an emulation of XICS on top of XIVE. Access to the emulated XICS is via OPAL calls. The one complication is that the EOI (end-of-interrupt) function can now return a value indicating that another interrupt is pending; in this case, the XIVE will not signal an interrupt in hardware to the CPU, and software is supposed to acknowledge the new interrupt without waiting for another interrupt to be delivered in hardware. This adapts KVM-HV to use the OPAL calls on machines where there is no XICS hardware. When there is no XICS, we look for a device-tree node with "ibm,opal-intc" in its compatible property, which is how OPAL indicates that it provides XICS emulation. In order to handle the EOI return value, kvmppc_read_intr() has become kvmppc_read_one_intr(), with a boolean variable passed by reference which can be set by the EOI functions to indicate that another interrupt is pending. The new kvmppc_read_intr() keeps calling kvmppc_read_one_intr() until there are no more interrupts to process. The return value from kvmppc_read_intr() is the largest non-zero value of the returns from kvmppc_read_one_intr(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
1704a81c |
|
17-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9 On POWER9, the msgsnd instruction is able to send interrupts to other cores, as well as other threads on the local core. Since msgsnd is generally simpler and faster than sending an IPI via the XICS, we use msgsnd for all IPIs sent by KVM on POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
ebe4535f |
|
09-Oct-2016 |
Daniel Axtens <dja@axtens.net> |
KVM: PPC: Book3S HV: sparse: prototypes for functions called from assembler A bunch of KVM functions are only called from assembler. Give them prototypes in asm-prototypes.h This reduces sparse warnings. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
f7af5209 |
|
18-Aug-2016 |
Suresh Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Complete passthrough interrupt in host In existing real mode ICP code, when updating the virtual ICP state, if there is a required action that cannot be completely handled in real mode, as for instance, a VCPU needs to be woken up, flags are set in the ICP to indicate the required action. This is checked when returning from hypercalls to decide whether the call needs switch back to the host where the action can be performed in virtual mode. Note that if h_ipi_redirect is enabled, real mode code will first try to message a free host CPU to complete this job instead of returning the host to do it ourselves. Currently, the real mode PCI passthrough interrupt handling code checks if any of these flags are set and simply returns to the host. This is not good enough as the trap value (0x500) is treated as an external interrupt by the host code. It is only when the trap value is a hypercall that the host code searches for and acts on unfinished work by calling kvmppc_xics_rm_complete. This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD which is returned by KVM if there is unfinished business to be completed in host virtual mode after handling a PCI passthrough interrupt. The host checks for this special interrupt condition and calls into the kvmppc_xics_rm_complete, which is made an exported function for this reason. [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
e3c13e56 |
|
18-Aug-2016 |
Suresh Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Handle passthrough interrupts in guest Currently, KVM switches back to the host to handle any external interrupt (when the interrupt is received while running in the guest). This patch updates real-mode KVM to check if an interrupt is generated by a passthrough adapter that is owned by this guest. If so, the real mode KVM will directly inject the corresponding virtual interrupt to the guest VCPU's ICS and also EOI the interrupt in hardware. In short, the interrupt is handled entirely in real mode in the guest context without switching back to the host. In some rare cases, the interrupt cannot be completely handled in real mode, for instance, a VCPU that is sleeping needs to be woken up. In this case, KVM simply switches back to the host with trap reason set to 0x500. This works, but it is clearly not very efficient. A following patch will distinguish this case and handle it correctly in the host. Note that we can use the existing check_too_hard() routine even though we are not in a hypercall to determine if there is unfinished business that needs to be completed in host virtual mode. The patch assumes that the mapping between hardware interrupt IRQ and virtual IRQ to be injected to the guest already exists for the PCI passthrough interrupts that need to be handled in real mode. If the mapping does not exist, KVM falls back to the default existing behavior. The KVM real mode code reads mappings from the mapped array in the passthrough IRQ map without taking any lock. We carefully order the loads and stores of the fields in the kvmppc_irq_map data structure using memory barriers to avoid an inconsistent mapping being seen by the reader. Thus, although it is possible to miss a map entry, it is not possible to read a stale value. [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap, pulled out powernv eoi change into a separate patch, made kvmppc_read_intr get the vcpu from the paca rather than being passed in, rewrote the logic at the end of kvmppc_read_intr to avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S since we were always restoring SRR0/1 anyway, get rid of the cached array (just use the mapped array), removed the kick_all_cpus_sync() call, clear saved_xirr PACA field when we handle the interrupt in real mode, fix compilation with CONFIG_KVM_XICS=n.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
37f55d30 |
|
18-Aug-2016 |
Suresh Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function Modify kvmppc_read_intr to make it a C function. Because it is called from kvmppc_check_wake_reason, any of the assembler code that calls either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume that the volatile registers might have been modified. This also adds in the optimization of clearing saved_xirr in the case where we completely handle and EOI an IPI. Without this, the next device interrupt will require two trips through the host interrupt handling code. [paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame when it is calling kvmppc_read_intr, which means we can set r12 to the trap number (0x500) after the call to kvmppc_read_intr, instead of using r31. Also moved the deliver_guest_interrupt label so as to restore XER and CTR, plus other minor tweaks.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
79b6c247 |
|
17-Dec-2015 |
Suresh Warrier <warrier@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Host-side RM data structures This patch defines the data structures to support the setting up of host side operations while running in real mode in the guest, and also the functions to allocate and free it. The operations are for now limited to virtual XICS operations. Currently, we have only defined one operation in the data structure: - Wake up a VCPU sleeping in the host when it receives a virtual interrupt The operations are assigned at the core level because PowerKVM requires that the host run in SMT off mode. For each core, we will need to manage its state atomically - where the state is defined by: 1. Is the core running in the host? 2. Is there a Real Mode (RM) operation pending on the host? Currently, core state is only managed at the whole-core level even when the system is in split-core mode. This just limits the number of free or "available" cores in the host to perform any host-side operations. The kvmppc_host_rm_core.rm_data allows any data to be passed by KVM in real mode to the host core along with the operation to be performed. The kvmppc_host_rm_ops structure is allocated the very first time a guest VM is started. Initial core state is also set - all online cores are in the host. This structure is never deleted, not even when there are no active guests. However, it needs to be freed when the module is unloaded because the kvmppc_host_rm_ops_hv can contain function pointers to kvm-hv.ko functions for the different supported host operations. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
b4deba5c |
|
02-Jul-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
ec257165 |
|
24-Jun-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Make use of unused threads when running guests When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu->arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
66feed61 |
|
27-Mar-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 This uses msgsnd where possible for signalling other threads within the same core on POWER8 systems, rather than IPIs through the XICS interrupt controller. This includes waking secondary threads to run the guest, the interrupts generated by the virtual XICS, and the interrupts to bring the other threads out of the guest when exiting. Aggregated statistics from debugfs across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, show this before the change: rm_entry: 3387.6ns (228 - 86600, 1008969 samples) rm_exit: 4561.5ns (12 - 3477452, 1009402 samples) rm_intr: 1660.0ns (12 - 553050, 3600051 samples) and this after the change: rm_entry: 3060.1ns (212 - 65138, 953873 samples) rm_exit: 4244.1ns (12 - 9693408, 954331 samples) rm_intr: 1342.3ns (12 - 1104718, 3405326 samples) for a test of booting Fedora 20 big-endian to the login prompt. The time taken for a H_PROD hcall (which is handled in the host kernel) went down from about 35 microseconds to about 16 microseconds with this change. The noinline added to kvmppc_run_core turned out to be necessary for good performance, at least with gcc 4.9.2 as packaged with Fedora 21 and a little-endian POWER8 host. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
eddb60fb |
|
27-Mar-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C This replaces the assembler code for kvmhv_commence_exit() with C code in book3s_hv_builtin.c. It also moves the IPI sending code that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
7d6c40da |
|
27-Mar-2015 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Use bitmap of active threads rather than count Currently, the entry_exit_count field in the kvmppc_vcore struct contains two 8-bit counts, one of the threads that have started entering the guest, and one of the threads that have started exiting the guest. This changes it to an entry_exit_map field which contains two bitmaps of 8 bits each. The advantage of doing this is that it gives us a bitmap of which threads need to be signalled when exiting the guest. That means that we no longer need to use the trick of setting the HDEC to 0 to pull the other threads out of the guest, which led in some cases to a spurious HDEC interrupt on the next guest entry. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
e928e9cb |
|
20-Mar-2015 |
Michael Ellerman <michael@ellerman.id.au> |
KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. Some PowerNV systems include a hardware random-number generator. This HWRNG is present on POWER7+ and POWER8 chips and is capable of generating one 64-bit random number every microsecond. The random numbers are produced by sampling a set of 64 unstable high-frequency oscillators and are almost completely entropic. PAPR defines an H_RANDOM hypercall which guests can use to obtain one 64-bit random sample from the HWRNG. This adds a real-mode implementation of the H_RANDOM hypercall. This hypercall was implemented in real mode because the latency of reading the HWRNG is generally small compared to the latency of a guest exit and entry for all the threads in the same virtual core. Userspace can detect the presence of the HWRNG and the H_RANDOM implementation by querying the KVM_CAP_PPC_HWRNG capability. The H_RANDOM hypercall implementation will only be invoked when the guest does an H_RANDOM hypercall if userspace first enables the in-kernel H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
90fd09f8 |
|
02-Dec-2014 |
Sam Bobroff <sam.bobroff@au1.ibm.com> |
KVM: PPC: Book3S HV: Improve H_CONFER implementation Currently the H_CONFER hcall is implemented in kernel virtual mode, meaning that whenever a guest thread does an H_CONFER, all the threads in that virtual core have to exit the guest. This is bad for performance because it interrupts the other threads even if they are doing useful work. The H_CONFER hcall is called by a guest VCPU when it is spinning on a spinlock and it detects that the spinlock is held by a guest VCPU that is currently not running on a physical CPU. The idea is to give this VCPU's time slice to the holder VCPU so that it can make progress towards releasing the lock. To avoid having the other threads exit the guest unnecessarily, we add a real-mode implementation of H_CONFER that checks whether the other threads are doing anything. If all the other threads are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER), it returns H_TOO_HARD which causes a guest exit and allows the H_CONFER to be handled in virtual mode. Otherwise it spins for a short time (up to 10 microseconds) to give other threads the chance to observe that this thread is trying to confer. The spin loop also terminates when any thread exits the guest or when all other threads are idle or trying to confer. If the timeout is reached, the H_CONFER returns H_SUCCESS. In this case the guest VCPU will recheck the spinlock word and most likely call H_CONFER again. This also improves the implementation of the H_CONFER virtual mode handler. If the VCPU is part of a virtual core (vcore) which is runnable, there will be a 'runner' VCPU which has taken responsibility for running the vcore. In this case we yield to the runner VCPU rather than the target VCPU. We also introduce a check on the target VCPU's yield count: if it differs from the yield count passed to H_CONFER, the target VCPU has run since H_CONFER was called and may have already released the lock. This check is required by PAPR. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
c17b98cf |
|
02-Dec-2014 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Remove code for PPC970 processors This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
68cf0d64 |
|
17-Sep-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Remove superfluous bootmem includes Lots of places included bootmem.h even when not using bootmem. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
14ed7409 |
|
17-Sep-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Remove some old bootmem related comments Now bootmem is gone from powerpc we can remove comments mentioning it. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
cec26bc3 |
|
29-Sep-2014 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode We use cma reserved area for creating guest hash page table. Don't do the reservation in non-hypervisor mode. This avoids unnecessary CMA reservation when booting with limited memory configs like fadump and kdump. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c04fa583 |
|
13-Aug-2014 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
PC, KVM, CMA: Fix regression caused by wrong get_order() use fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no functional change but this is not true as it calls get_order() (which takes bytes) where it should have called order_base_2() and the kernel stops on VM_BUG_ON(). This replaces get_order() with order_base_2() (round-up version of ilog2). Suggested-by: Paul Mackerras <paulus@samba.org> Cc: Alexander Graf <agraf@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c1f733aa |
|
06-Aug-2014 |
Joonsoo Kim <iamjoonsoo.kim@lge.com> |
mm, CMA: change cma_declare_contiguous() to obey coding convention Conventionally, we put output param to the end of param list and put the 'base' ahead of 'size', but cma_declare_contiguous() doesn't look like that, so change it. Additionally, move down cma_areas reference code to the position where it is really needed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Gleb Natapov <gleb@kernel.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fc95ca72 |
|
06-Aug-2014 |
Joonsoo Kim <iamjoonsoo.kim@lge.com> |
PPC, KVM, CMA: use general CMA reserved area management framework Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Gleb Natapov <gleb@kernel.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae2113a4 |
|
01-Jun-2014 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
441c19c8 |
|
23-May-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/kvm/book3s_hv: Rework the secondary inhibit code As part of the support for split core on POWER8, we want to be able to block splitting of the core while KVM VMs are active. The logic to do that would be exactly the same as the code we currently have for inhibiting onlining of secondaries. Instead of adding an identical mechanism to block split core, rework the secondary inhibit code to be a "HV KVM is active" check. We can then use that in both the cpu hotplug code and the upcoming split core code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6c45b810 |
|
01-Jul-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/kvm: Contiguous memory allocator based RMA allocation Older version of power architecture use Real Mode Offset register and Real Mode Limit Selector for mapping guest Real Mode Area. The guest RMA should be physically contigous since we use the range when address translation is not enabled. This patch switch RMA allocation code to use contigous memory allocator. The patch also remove the the linear allocator which not used any more Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
fa61a4e3 |
|
01-Jul-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/kvm: Contiguous memory allocator based hash page table allocation Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest hash page table. This implies that we need to reserve a big memory region to ensure we can create large number of guest simultaneously with KVM on Power. Another disadvantage is that the reserved memory is not available to rest of the subsystems and and that implies we limit the total available memory in the host. This patch series switch the guest hash page table allocation to use contiguous memory allocator. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
1340f3e8 |
|
05-Aug-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Quieten message about allocating linear regions This is printed once for every RMA or HPT region that get preallocated. If one preallocates hundreds of such regions (in order to run hundreds of KVM guests), that gets rather painful, so make it a bit quieter. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
32fad281 |
|
03-May-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Make the guest hash table size configurable This adds a new ioctl to enable userspace to control the size of the guest hashed page table (HPT) and to clear it out when resetting the guest. The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter a pointer to a u32 containing the desired order of the HPT (log base 2 of the size in bytes), which is updated on successful return to the actual order of the HPT which was allocated. There must be no vcpus running at the time of this ioctl. To enforce this, we now keep a count of the number of vcpus running in kvm->arch.vcpus_running. If the ioctl is called when a HPT has already been allocated, we don't reallocate the HPT but just clear it out. We first clear the kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold the kvm->lock mutex, it will prevent any vcpus from starting to run until we're done, and (b) it means that the first vcpu to run after we're done will re-establish the VRMA if necessary. If userspace doesn't call this ioctl before running the first vcpu, the kernel will allocate a default-sized HPT at that point. We do it then rather than when creating the VM, as the code did previously, so that userspace has a chance to do the ioctl if it wants. When allocating the HPT, we can allocate either from the kernel page allocator, or from the preallocated pool. If userspace is asking for a different size from the preallocated HPTs, we first try to allocate using the kernel page allocator. Then we try to allocate from the preallocated pool, and then if that fails, we try allocating decreasing sizes from the kernel page allocator, down to the minimum size allowed (256kB). Note that the kernel page allocator limits allocations to 1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to 16MB (on 64-bit powerpc, at least). Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix module compilation] Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
b4e51229 |
|
02-Feb-2012 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears exist In kvm_alloc_linear we were using and deferencing ri after the list_for_each_entry had come to the end of the list. In that situation, ri is not really defined and probably points to the list head. This will happen every time if the free_linears list is empty, for instance. This led to a NULL pointer dereference crash in memset on POWER7 while trying to allocate an HPT in the case where no HPTs were preallocated. This fixes it by using a separate variable for the return value from the loop iterator. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d2a1b483 |
|
16-Jan-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Add HPT preallocator We're currently allocating 16MB of linear memory on demand when creating a guest. That does work some times, but finding 16MB of linear memory available in the system at runtime is definitely not a given. So let's add another command line option similar to the RMA preallocator, that we can use to keep a pool of page tables around. Now, when a guest gets created it has a pretty low chance of receiving an OOM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b7f5d011 |
|
17-Jan-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Initialize linears with zeros RMAs and HPT preallocated spaces should be zeroed, so we don't accidently leak information from previous VM executions. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b4e70611 |
|
16-Jan-2012 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Convert RMA allocation into generic code We have code to allocate big chunks of linear memory on bootup for later use. This code is currently used for RMA allocation, but can be useful beyond that extent. Make it generic so we can reuse it for other stuff later. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
6c9b7c40 |
|
07-Nov-2011 |
Nishanth Aravamudan <nacc@us.ibm.com> |
KVM: PPC: annotate kvm_rma_init as __init kvm_rma_init() is only called at boot-time, by setup_arch, which is also __init. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
66b15db6 |
|
27-May-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: add export.h to files making use of EXPORT_SYMBOL With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
9e368f29 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
969391c5 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to indicate that we have a usable hypervisor mode, and another to indicate that the processor conforms to PowerISA version 2.06. We also add another bit to indicate that the processor conforms to ISA version 2.01 and set that for PPC970 and derivatives. Some PPC970 chips (specifically those in Apple machines) have a hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode is not useful in the sense that there is no way to run any code in supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1 bits in HID4 are always 0, and we use that as a way of detecting that hypervisor mode is not useful. Where we have a feature section in assembly code around code that only applies on POWER7 in hypervisor mode, we use a construct like END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) The definition of END_FTR_SECTION_IFSET is such that the code will be enabled (not overwritten with nops) only if all bits in the provided mask are set. Note that the CPU feature check in __tlbie() only needs to check the ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called if we are running bare-metal, i.e. in hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
aa04b4cc |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|