#
2947a456 |
|
09-Jan-2024 |
Nathan Chancellor <nathan@kernel.org> |
treewide: update LLVM Bugzilla links LLVM moved their issue tracker from their own Bugzilla instance to GitHub issues. While all of the links are still valid, they may not necessarily show the most up to date information around the issues, as all updates will occur on GitHub, not Bugzilla. Another complication is that the Bugzilla issue number is not always the same as the GitHub issue number. Thankfully, LLVM maintains this mapping through two shortlinks: https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num> https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num> Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the "https://llvm.org/pr<num>" shortlink so that the links show the most up to date information. Each migrated issue links back to the Bugzilla entry, so there should be no loss of fidelity of information here. Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Fangrui Song <maskray@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
7d370e18 |
|
01-Dec-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest An L0 must invalidate the L2's RPT during H_GUEST_DELETE if this has not already been done. This is a slow operation that means H_GUEST_DELETE must return H_BUSY multiple times before completing. Invalidating the tables before deleting the guest so there is less work for the L0 to do. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231201132618.555031-2-vaibhav@linux.ibm.com
|
#
19d31c5f |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Add support for nestedv2 guests A series of hcalls have been added to the PAPR which allow a regular guest partition to create and manage guest partitions of its own. KVM already had an interface that allowed this on powernv platforms. This existing interface will now be called "nestedv1". The newly added PAPR interface will be called "nestedv2". PHYP will support the nestedv2 interface. At this time the host side of the nestedv2 interface has not been implemented on powernv but there is no technical reason why it could not be added. The nestedv1 interface is still supported. Add support to KVM to utilize these hcalls to enable running nested guests as a pseries guest on PHYP. Overview of the new hcall usage: - L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES() - L1 requests the L0 create a L2 with H_GUEST_CREATE() and receives a handle to use in future hcalls - L1 requests the L0 create a L2 vCPU with H_GUEST_CREATE_VCPU() - L1 sets up the L2 using H_GUEST_SET and the H_GUEST_VCPU_RUN input buffer - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN() - L2 returns to L1 with an exit reason and L1 reads the H_GUEST_VCPU_RUN output buffer populated by the L0 - L1 handles the exit using H_GET_STATE if necessary - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN - L1 frees the L2 in the L0 with H_GUEST_DELETE() Support for the new API is determined by trying H_GUEST_GET_CAPABILITIES. On a successful return, use the nestedv2 interface. Use the vcpu register state setters for tracking modified guest state elements and copy the thread wide values into the H_GUEST_VCPU_RUN input buffer immediately before running a L2. The guest wide elements can not be added to the input buffer so send them with a separate H_GUEST_SET call if necessary. Make the vcpu register getter load the corresponding value from the real host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track which values have already been loaded between H_GUEST_VCPU_RUN calls. If an element is present in the H_GUEST_VCPU_RUN output buffer it also does not need to be loaded again. Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> Signed-off-by: Amit Machhiwal <amachhiw@linux.vnet.ibm.com> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-11-jniethe5@gmail.com
|
#
dfcaacc8 |
|
13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long The LPID register is 32 bits long. The host keeps the lpids for each guest in an unsigned word struct kvm_arch. Currently, LPIDs are already limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT. The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host for each L2 guest. This value is used as an lpid, e.g. it is the parameter used by H_RPT_INVALIDATE. To minimize needless special casing it makes sense to keep this "Guest ID" in struct kvm_arch::lpid. This means that struct kvm_arch::lpid is too small so prepare for this and make it an unsigned long. This is not a problem for the KVM-HV and nestedv1 cases as their lpid values are already limited to valid ranges so in those contexts the lpid can be used as an unsigned word safely as needed. In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already specified as an unsigned long so change pseries_rpt_invalidate() to match that. Update the callers of pseries_rpt_invalidate() to also take an unsigned long if they take an lpid value. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-10-jniethe5@gmail.com
|
#
6cd5c1db |
|
30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Set SRR1[PREFIX] bit on injected interrupts Pass the hypervisor (H)SRR1[PREFIX] indication through to synchronous interrupts injected into the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-3-npiggin@gmail.com
|
#
460ba21d |
|
30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Permit SRR1 flags in more injected interrupt types The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1 for many types of synchronous interrupts which is set when the interrupt is caused by a prefixed instruction. This requires KVM to be able to set this bit when injecting interrupts into a guest. Plumb through the SRR1 "flags" argument to the core_queue APIs where it's missing for this. For now they are set to 0, which is no change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup kvmppc_core_queue_alignment() in booke.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-2-npiggin@gmail.com
|
#
20ec3ebd |
|
16-Aug-2022 |
Chao Peng <chao.p.peng@linux.intel.com> |
KVM: Rename mmu_notifier_* to mmu_invalidate_* The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
46d60bdb |
|
06-May-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Include asm/firmware.h in all users of firmware_has_feature() Trying to remove asm/ppc_asm.h from all places that don't need it leads to several failures linked to firmware_has_feature(). To fix it, include asm/firmware.h in all files using firmware_has_feature() All users found with: git grep -L "firmware\.h" ` git grep -l "firmware_has_feature("` Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/11956ec181a034b51a881ac9c059eea72c679a73.1651828453.git.christophe.leroy@csgroup.eu
|
#
2852ebfa |
|
02-Mar-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to be sent to L2 and missed by the L0. Clearing LPES when it may be set, as typically happens with XIVE enabled could cause a performance issue despite having no native XIVE support in the guest, because it will cause mediated interrupts for the L2 to be taken in HV mode, which then have to be injected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
|
#
03a2e65f |
|
23-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, fix it to 4096 (12-bits) explicitly for now. kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS because the L1 partition table registration hcall already did that, and it checks against the partition table size. This patch also puts all the partition table size calculations into the same form, using 12 for the architected size field shift and 4 for the shift corresponding to the partition table entry size. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-of-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
|
#
c0f00a18 |
|
23-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr This removes the fixed sized kvm->arch.nested_guests array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
|
#
2031f287 |
|
14-Apr-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused Add wrappers to acquire/release KVM's SRCU lock when stashing the index in vcpu->src_idx, along with rudimentary detection of illegal usage, e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the SRCU index is (currently) either 0 or 1, illegal nesting bugs can go unnoticed for quite some time and only cause problems when the nested lock happens to get a different index. Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will likely yell so loudly that it will bring the kernel to its knees. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20220415004343.2203171-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
22f7ff0d |
|
22-Jan-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs The L0 is storing HFSCR requested by the L1 for the L2 in struct kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest is not a per-vCPU structure. Hilarity ensues. Fix it by moving the nested hfscr into the vCPU structure together with the other per-vCPU nested fields. Fixes: 8b210a880b35 ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220122105530.3477250-1-npiggin@gmail.com
|
#
a54d8066 |
|
06-Dec-2021 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
KVM: Keep memslots in tree-based structures instead of array-based ones The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
|
#
79202090 |
|
01-Sep-2021 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN)) in __alloc_pages() by constructing a tiny VM which only does H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy). This silences the warning by adding __GFP_NOWARN. Spotted by syzkaller. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210901084550.1658699-1-aik@ozlabs.ru
|
#
46dea77f |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry mftb() is expensive and one can be avoided on nested guest dispatch. If the time checking code distinguishes between the L0 timer and the nested HV timer, then both can be tested in the same place with the same mftb() value. This also nicely illustrates the relationship between the L0 and nested HV timers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-45-npiggin@gmail.com
|
#
3c1a4322 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase Change dec_expires to be relative to the guest timebase, and allow it to be moved into low level P9 guest entry functions, to improve SPR access scheduling. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-23-npiggin@gmail.com
|
#
f2e29db1 |
|
11-Aug-2021 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV Nested: save_hv_return_state does not require trap argument vcpu is already anargument so vcpu->arch.trap can be used directly. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-9-npiggin@gmail.com
|
#
8b210a88 |
|
11-Aug-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible When the L0 runs a nested L2, there are several permutations of HFSCR that can be relevant. The HFSCR that the L1 vcpu L1 requested, the HFSCR that the L1 vcpu may use, and the HFSCR that is actually being used to run the L2. The L1 requested HFSCR is not accessible outside the nested hcall handler, so copy that into a new kvm_nested_guest.hfscr field. The permitted HFSCR is taken from the HFSCR that the L1 runs with, which is also not accessible while the hcall is being made. Move this into a new kvm_vcpu_arch.hfscr_permitted field. These will be used by the next patch to improve facility handling for nested guests, and later by facility demand faulting patches. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-7-npiggin@gmail.com
|
#
7487cabc |
|
11-Aug-2021 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV Nested: Sanitise vcpu registers As one of the arguments of the H_ENTER_NESTED hypercall, the nested hypervisor (L1) prepares a structure containing the values of various hypervisor-privileged registers with which it wants the nested guest (L2) to run. Since the nested HV runs in supervisor mode it needs the host to write to these registers. To stop a nested HV manipulating this mechanism and using a nested guest as a proxy to access a facility that has been made unavailable to it, we have a routine that sanitises the values of the HV registers before copying them into the nested guest's vcpu struct. However, when coming out of the guest the values are copied as they were back into L1 memory, which means that any sanitisation we did during guest entry will be exposed to L1 after H_ENTER_NESTED returns. This patch alters this sanitisation to have effect on the vcpu->arch registers directly before entering and after exiting the guest, leaving the structure that is copied back into L1 unchanged (except when we really want L1 to access the value, e.g the Cause bits of HFSCR). Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20210811160134.904987-6-npiggin@gmail.com
|
#
d9c57d3e |
|
08-Jul-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state The H_ENTER_NESTED hypercall is handled by the L0, and it is a request by the L1 to switch the context of the vCPU over to that of its L2 guest, and return with an interrupt indication. The L1 is responsible for switching some registers to guest context, and the L0 switches others (including all the hypervisor privileged state). If the L2 MSR has TM active, then the L1 is responsible for recheckpointing the L2 TM state. Then the L1 exits to L0 via the H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit, and then it recheckpoints the TM state as part of the nested entry and finally HRFIDs into the L2 with TM active MSR. Not efficient, but about the simplest approach for something that's horrendously complicated. Problems arise if the L1 exits to the L0 with a TM state which does not match the L2 TM state being requested. For example if the L1 is transactional but the L2 MSR is non-transactional, or vice versa. The L0's HRFID can take a TM Bad Thing interrupt and crash. Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then ensuring that if the L1 is suspended then the L2 must have TM active, and if the L1 is not suspended then the L2 must not have TM active. Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
51696f39 |
|
21-Jun-2021 |
Nathan Chancellor <nathan@kernel.org> |
KVM: PPC: Book3S HV: Workaround high stack usage with clang LLVM does not emit optimal byteswap assembly, which results in high stack usage in kvmhv_enter_nested_guest() due to the inlining of byteswap_pt_regs(). With LLVM 12.0.0: arch/powerpc/kvm/book3s_hv_nested.c:289:6: error: stack frame size of 2512 bytes in function 'kvmhv_enter_nested_guest' [-Werror,-Wframe-larger-than=] long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) ^ 1 error generated. While this gets fixed in LLVM, mark byteswap_pt_regs() as noinline_for_stack so that it does not get inlined and break the build due to -Werror by default in arch/powerpc/. Not inlining saves approximately 800 bytes with LLVM 12.0.0: arch/powerpc/kvm/book3s_hv_nested.c:290:6: warning: stack frame size of 1728 bytes in function 'kvmhv_enter_nested_guest' [-Wframe-larger-than=] long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) ^ 1 warning generated. Cc: stable@vger.kernel.org # v4.20+ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1292 Link: https://bugs.llvm.org/show_bug.cgi?id=49610 Link: https://lore.kernel.org/r/202104031853.vDT0Qjqj-lkp@intel.com/ Link: https://gist.github.com/ba710e3703bf45043a31e2806c843ffd Link: https://lore.kernel.org/r/20210621182440.990242-1-nathan@kernel.org
|
#
81468083 |
|
21-Jun-2021 |
Bharata B Rao <bharata@linux.ibm.com> |
KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall H_RPT_INVALIDATE if available. The availability of this hcall is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions DT property. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-7-bharata@linux.ibm.com
|
#
53324b51 |
|
21-Jun-2021 |
Bharata B Rao <bharata@linux.ibm.com> |
KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE Enable support for process-scoped invalidations from nested guests and partition-scoped invalidations for nested guests. Process-scoped invalidations for any level of nested guests are handled by implementing H_RPT_INVALIDATE handler in the nested guest exit path in L0. Partition-scoped invalidation requests are forwarded to the right nested guest, handled there and passed down to L0 for eventual handling. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [aneesh: Nested guest partition-scoped invalidation changes] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Squash in fixup patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com
|
#
732f21a3 |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR Rather than clear the HV bit from the MSR at guest entry, make it clear that the hypervisor does not allow the guest to set the bit. The HV clear is kept in guest entry for now, but a future patch will warn if it is set. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-13-npiggin@gmail.com
|
#
946cf44a |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR Rather than add the ME bit to the MSR at guest entry, make it clear that the hypervisor does not allow the guest to clear the bit. The ME set is kept in guest entry for now, but a future patch will warn if it's not present. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-12-npiggin@gmail.com
|
#
67145ef4 |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits Guest LPCR depends on hardware type, and future changes will add restrictions based on errata and guest MMU mode. Move this logic to a common function and use it for the cases where the guest wants to update its LPCR (or the LPCR of a nested guest). This also adds a warning in other places that set or update LPCR if we try to set something that would have been disallowed by the filter, as a sanity check. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-4-npiggin@gmail.com
|
#
a19b70ab |
|
11-Apr-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs This will get a bit more complicated in future patches. Move it into the helper function. This change allows the L1 hypervisor to determine some of the LPCR bits that the L0 is using to run it, which could be a privilege violation (LPCR is HV-privileged), although the same problem exists now for HFSCR for example. Discussion of the HV privilege issue is ongoing and can be resolved with a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-3-npiggin@gmail.com
|
#
bd1de1a0 |
|
16-Dec-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
122954ed7 |
|
16-Dec-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
KVM: PPC: Book3S HV: Rename current DAWR macros and variables Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
afe75049 |
|
16-Dec-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1 On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosened up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new members to be added only at the end, i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
cf59eb13 |
|
21-Sep-2020 |
Wang Wensheng <wangwensheng4@huawei.com> |
KVM: PPC: Book3S: Fix symbol undeclared warnings Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol 'kvmppc_radix_set_pte_at' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol 'kvmhv_p9_guest_entry' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol 'iommu_tce_kill_rm' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol 'kvmppc_tce_iommu_do_map' was not declared. Should it be static? arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr' was not declared. Should it be static? Those symbols are used only in the files that define them so make them static to fix the warnings. Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
1508c22f |
|
08-Jun-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Protect kvm_vcpu_read_guest with srcu locks The kvm_vcpu_read_guest/kvm_vcpu_write_guest used for nested guests eventually call srcu_dereference_check to dereference a memslot and lockdep produces a warning as neither kvm->slots_lock nor kvm->srcu lock is held and kvm->users_count is above zero (>100 in fact). This wraps mentioned VCPU read/write helpers in srcu read lock/unlock as it is done in other places. This uses vcpu->srcu_idx when possible. These helpers are only used for nested KVM so this may explain why we did not see these before. Here is an example of a warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Not tainted ----------------------------- include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-ppc/2752: #0: c000200359016be0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x144/0xd80 [kvm] stack backtrace: CPU: 80 PID: 2752 Comm: qemu-system-ppc Not tainted 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Call Trace: [c0002003591ab240] [c000000000b23ab4] dump_stack+0x190/0x25c (unreliable) [c0002003591ab2b0] [c00000000023f954] lockdep_rcu_suspicious+0x140/0x164 [c0002003591ab330] [c008000004a445f8] kvm_vcpu_gfn_to_memslot+0x4c0/0x510 [kvm] [c0002003591ab3a0] [c008000004a44c18] kvm_vcpu_read_guest+0xa0/0x180 [kvm] [c0002003591ab410] [c008000004ff9bd8] kvmhv_enter_nested_guest+0x90/0xb80 [kvm_hv] [c0002003591ab980] [c008000004fe07bc] kvmppc_pseries_do_hcall+0x7b4/0x1c30 [kvm_hv] [c0002003591aba10] [c008000004fe5d30] kvmppc_vcpu_run_hv+0x10a8/0x1a30 [kvm_hv] [c0002003591abae0] [c008000004a5d954] kvmppc_vcpu_run+0x4c/0x70 [kvm] [c0002003591abb10] [c008000004a56e54] kvm_arch_vcpu_ioctl_run+0x56c/0x7c0 [kvm] [c0002003591abba0] [c008000004a3ddc4] kvm_vcpu_ioctl+0x4ac/0xd80 [kvm] [c0002003591abd20] [c0000000006ebb58] ksys_ioctl+0x188/0x210 [c0002003591abd70] [c0000000006ebc28] sys_ioctl+0x48/0xb0 [c0002003591abdb0] [c000000000042764] system_call_exception+0x1d4/0x2e0 [c0002003591abe20] [c00000000000cce8] system_call_common+0xe8/0x214 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
e881bfaf |
|
10-Jun-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Fix nested guest RC bits update Before commit 6cdf30375f82 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table") we called __find_linux_pte() with a page table pointer from a kvm_nested_guest struct but now we rely on kvmhv_find_nested() which takes an L1 LPID and returns a kvm_nested_guest pointer, however we pass a L0 LPID there and the L2 guest hangs. This fixes the LPID passed to kvmppc_hv_handle_set_rc(). Fixes: 6cdf30375f82 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611030559.75257-1-aik@ozlabs.ru
|
#
65fddcfc |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: reorder includes after introduction of linux/pgtable.h The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ca5999fd |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: introduce include/linux/pgtable.h The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8c99d345 |
|
26-Apr-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
KVM: PPC: Clean up redundant 'kvm_run' parameters In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
2610a57f |
|
26-Apr-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
KVM: PPC: Remove redundant kvm_run from vcpu_arch The 'kvm_run' field already exists in the 'vcpu' structure, which is the same structure as the 'kvm_run' in the 'vcpu_arch' and should be deleted. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
32e594f9 |
|
01-Apr-2020 |
Chen Zhou <chenzhou10@huawei.com> |
KVM: PPC: Book3S HV: Remove redundant NULL check Free function kfree() already does NULL check, so the additional check is unnecessary, just remove it. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
b691505e |
|
05-May-2020 |
Jordan Niethe <jniethe5@gmail.com> |
powerpc: Define new SRR1 bits for a ISA v3.1 Add the BOUNDARY SRR1 bit definition for when the cause of an alignment exception is a prefixed instruction that crosses a 64-byte boundary. Add the PREFIXED SRR1 bit definition for exceptions caused by prefixed instructions. Bit 35 of SRR1 is called SRR1_ISI_N_OR_G. This name comes from it being used to indicate that an ISI was due to the access being no-exec or guarded. ISA v3.1 adds another purpose. It is also set if there is an access in a cache-inhibited location for prefixed instruction. Rename from SRR1_ISI_N_OR_G to SRR1_ISI_N_G_OR_CIP. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-23-jniethe5@gmail.com
|
#
6cdf3037 |
|
04-May-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table update kvmppc_hv_handle_set_rc to use find_kvm_nested_guest_pte and find_kvm_secondary_pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-12-aneesh.kumar@linux.ibm.com
|
#
dc891849 |
|
04-May-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/kvm/nested: Add helper to walk nested shadow linux page table. The locking rules for walking nested shadow linux page table is different from process scoped table. Hence add a helper for nested page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-11-aneesh.kumar@linux.ibm.com
|
#
4b99412e |
|
04-May-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/kvm/book3s: Add helper to walk partition scoped linux page table. The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com
|
#
9ee6471e |
|
02-Oct-2019 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S: Define and use SRR1_MSR_BITS Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
13c7bb3c |
|
16-Sep-2019 |
Jordan Niethe <jniethe5@gmail.com> |
powerpc/64s: Set reserved PCR bits Currently the reserved bits of the Processor Compatibility Register (PCR) are cleared as per the Programming Note in Section 1.3.3 of version 3.0B of the Power ISA. This causes all new architecture features to be made available when running on newer processors with new architecture features added to the PCR as bits must be set to disable a given feature. For example to disable new features added as part of Version 2.07 of the ISA the corresponding bit in the PCR needs to be set. As new processor features generally require explicit kernel support they should be disabled until such support is implemented. Therefore kernels should set all unknown/reserved bits in the PCR such that any new architecture features which the kernel does not currently know about get disabled. An update is planned to the ISA to clarify that the PCR is an exception to the Programming Note on reserved bits in Section 1.3.3. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190917004605.22471-2-alistair@popple.id.au
|
#
fd13daea |
|
02-Sep-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: make mmu_partition_table_set_entry TLB flush optional No functional change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-4-npiggin@gmail.com
|
#
99161de3 |
|
02-Sep-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/radix: tidy up TLB flushing code There should be no functional changes. - Use calls to existing radix_tlb.c functions in flush_partition. - Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar, because they flush everything, matching flush_all_mm rather than flush_tlb_mm for the lpid. - Remove some unused radix_tlb.c flush primitives. Signed-off: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-3-npiggin@gmail.com
|
#
90165d3d |
|
20-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Introduce kvmhv_update_nest_rmap_rc_list() Introduce a function kvmhv_update_nest_rmap_rc_list() which for a given nest_rmap list will traverse it, find the corresponding pte in the shadow page tables, and if it still maps the same host page update the rc bits accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
8b23eee4 |
|
20-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Apply combination of host and l1 pte rc for nested guest The shadow page table contains ptes for translations from nested guest address to host address. Currently when creating these ptes we take the rc bits from the pte for the L1 guest address to host address translation. This is incorrect as we must also factor in the rc bits from the pte for the nested guest address to L1 guest address translation (as contained in the L1 guest partition table for the nested guest). By not calculating these bits correctly L1 may not have been correctly notified when it needed to update its rc bits in the partition table it maintains for its nested guest. Modify the code so that the rc bits in the resultant pte for the L2->L0 translation are the 'and' of the rc bits in the L2->L1 pte and the L1->L0 pte, also accounting for whether this was a write access when setting the dirty bit. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
8400f874 |
|
20-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Align gfn to L1 page size when inserting nest-rmap entry Nested rmap entries are used to store the translation from L1 gpa to L2 gpa when entries are inserted into the shadow (nested) page tables. This rmap list is located by indexing the rmap array in the memslot by L1 gfn. When we come to search for these entries we only know the L1 page size (which could be PAGE_SIZE, 2M or a 1G page) and so can only select a gfn aligned to that size. This means that when we insert the entry, so we can find it later, we need to align the gfn we use to select the rmap list in which to insert the entry to L1 page size as well. By not doing this we were missing nested rmap entries when modifying L1 ptes which were for a page also passed through to an L2 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
bec6e03b |
|
20-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Hold kvm->mmu_lock across updating nested pte rc bits We already hold the kvm->mmu_lock spin lock across updating the rc bits in the pte for the L1 guest. Continue to hold the lock across updating the rc bits in the pte for the nested guest as well to prevent invalidations from occurring. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
95d386c2 |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
6ff887b8 |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2 A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
873db2cd |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
d232afeb |
|
13-Dec-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
c43c3a86 |
|
11-Dec-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Cleanups - constify memslots, fix comments This adds 'const' to the declarations for the struct kvm_memory_slot pointer parameters of some functions, which will make it possible to call those functions from kvmppc_core_commit_memory_region_hv() in the next patch. This also fixes some comments about locking. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
83a05510 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add nested shadow page tables to debugfs This adds a list of valid shadow PTEs for each nested guest to the 'radix' file for the guest in debugfs. This can be useful for debugging. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
10b5022d |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Handle differing endianness for H_ENTER_NESTED The hcall H_ENTER_NESTED takes two parameters: the address in L1 guest memory of a hv_regs struct and the address of a pt_regs struct. The hcall requests the L0 hypervisor to use the register values in these structs to run a L2 guest and to return the exit state of the L2 guest in these structs. These are in the endianness of the L1 guest, rather than being always big-endian as is usually the case for PAPR hypercalls. This is convenient because it means that the L1 guest can pass the address of the regs field in its kvm_vcpu_arch struct. This also improves performance slightly by avoiding the need for two copies of the pt_regs struct. When reading/writing these structures, this patch handles the case where the endianness of the L1 guest differs from that of the L0 hypervisor, by byteswapping the structures after reading and before writing them back. Since all the fields of the pt_regs are of the same type, i.e., unsigned long, we treat it as an array of unsigned longs. The fields of struct hv_guest_state are not all the same, so its fields are byteswapped individually. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
73937deb |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Sanitise hv_regs on nested guest entry restore_hv_regs() is used to copy the hv_regs L1 wants to set to run the nested (L2) guest into the vcpu structure. We need to sanitise these values to ensure we don't let the L1 guest hypervisor do things we don't want it to. We don't let data address watchpoints or completed instruction address breakpoints be set to match in hypervisor state. We also don't let L1 enable features in the hypervisor facility status and control register (HFSCR) for L2 which we have disabled for L1. That is L2 will get the subset of features which the L0 hypervisor has enabled for L1 and the features L1 wants to enable for L2. This could mean we give L1 a hypervisor facility unavailable interrupt for a facility it thinks it has enabled, however it shouldn't have enabled a facility it itself doesn't have for the L2 guest. We sanitise the registers when copying in the L2 hv_regs. We don't need to sanitise when copying back the L1 hv_regs since these shouldn't be able to contain invalid values as they're just what was copied out. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
9d0b048d |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Invalidate TLB when nested vcpu moves physical cpu This is only done at level 0, since only level 0 knows which physical CPU a vcpu is running on. This does for nested guests what L0 already did for its own guests, which is to flush the TLB on a pCPU when it goes to run a vCPU there, and there is another vCPU in the same VM which previously ran on this pCPU and has now started to run on another pCPU. This is to handle the situation where the other vCPU touched a mapping, moved to another pCPU and did a tlbiel (local-only tlbie) on that new pCPU and thus left behind a stale TLB entry on this pCPU. This introduces a limit on the the vcpu_token values used in the H_ENTER_NESTED hcall -- they must now be less than NR_CPUS. [paulus@ozlabs.org - made prev_cpu array be short[] to reduce memory consumption.] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
690ed4ca |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Use hypercalls for TLB invalidation when nested This adds code to call the H_TLB_INVALIDATE hypercall when running as a guest, in the cases where we need to invalidate TLBs (or other MMU caches) as part of managing the mappings for a nested guest. Calling H_TLB_INVALIDATE lets the nested hypervisor inform the parent hypervisor about changes to partition-scoped page tables or the partition table without needing to do hypervisor-privileged tlbie instructions. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e3b6b466 |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Implement H_TLB_INVALIDATE hcall When running a nested (L2) guest the guest (L1) hypervisor will use the H_TLB_INVALIDATE hcall when it needs to change the partition scoped page tables or the partition table which it manages. It will use this hcall in the situations where it would use a partition-scoped tlbie instruction if it were running in hypervisor mode. The H_TLB_INVALIDATE hcall can invalidate different scopes: Invalidate TLB for a given target address: - This invalidates a single L2 -> L1 pte - We need to invalidate any L2 -> L0 shadow_pgtable ptes which map the L2 address space which is being invalidated. This is because a single L2 -> L1 pte may have been mapped with more than one pte in the L2 -> L0 page tables. Invalidate the entire TLB for a given LPID or for all LPIDs: - Invalidate the entire shadow_pgtable for a given nested guest, or for all nested guests. Invalidate the PWC (page walk cache) for a given LPID or for all LPIDs: - We don't cache the PWC, so nothing to do. Invalidate the entire TLB, PWC and partition table for a given/all LPIDs: - Here we re-read the partition table entry and remove the nested state for any nested guest for which the first doubleword of the partition table entry is now zero. The H_TLB_INVALIDATE hcall takes as parameters the tlbie instruction word (of which only the RIC, PRS and R fields are used), the rS value (giving the lpid, where required) and the rB value (giving the IS, AP and EPN values). [paulus@ozlabs.org - adapted to having the partition table in guest memory, added the H_TLB_INVALIDATE implementation, removed tlbie instruction emulation, reworded the commit message.] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8cf531ed |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Introduce rmap to track nested guest mappings When a host (L0) page which is mapped into a (L1) guest is in turn mapped through to a nested (L2) guest we keep a reverse mapping (rmap) so that these mappings can be retrieved later. Whenever we create an entry in a shadow_pgtable for a nested guest we create a corresponding rmap entry and add it to the list for the L1 guest memslot at the index of the L1 guest page it maps. This means at the L1 guest memslot we end up with lists of rmaps. When we are notified of a host page being invalidated which has been mapped through to a (L1) guest, we can then walk the rmap list for that guest page, and find and invalidate all of the corresponding shadow_pgtable entries. In order to reduce memory consumption, we compress the information for each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits for the guest real page frame number -- which will fit in a single unsigned long. To avoid a scenario where a guest can trigger unbounded memory allocations, we scan the list when adding an entry to see if there is already an entry with the contents we need. This can occur, because we don't ever remove entries from the middle of a list. A struct nested guest rmap is a list pointer and an rmap entry; ---------------- | next pointer | ---------------- | rmap entry | ---------------- Thus the rmap pointer for each guest frame number in the memslot can be either NULL, a single entry, or a pointer to a list of nested rmap entries. gfn memslot rmap array ------------------------- 0 | NULL | (no rmap entry) ------------------------- 1 | single rmap entry | (rmap entry with low bit set) ------------------------- 2 | list head pointer | (list of rmap entries) ------------------------- The final entry always has the lowest bit set and is stored in the next pointer of the last list entry, or as a single rmap entry. With a list of rmap entries looking like; ----------------- ----------------- ------------------------- | list head ptr | ----> | next pointer | ----> | single rmap entry | ----------------- ----------------- ------------------------- | rmap entry | | rmap entry | ----------------- ------------------------- Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
fd10be25 |
|
07-Oct-2018 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
KVM: PPC: Book3S HV: Handle page fault for a nested guest Consider a normal (L1) guest running under the main hypervisor (L0), and then a nested guest (L2) running under the L1 guest which is acting as a nested hypervisor. L0 has page tables to map the address space for L1 providing the translation from L1 real address -> L0 real address; L1 | | (L1 -> L0) | ----> L0 There are also page tables in L1 used to map the address space for L2 providing the translation from L2 real address -> L1 read address. Since the hardware can only walk a single level of page table, we need to maintain in L0 a "shadow_pgtable" for L2 which provides the translation from L2 real address -> L0 real address. Which looks like; L2 L2 | | | (L2 -> L1) | | | ----> L1 | (L2 -> L0) | | | (L1 -> L0) | | | ----> L0 --------> L0 When a page fault occurs while running a nested (L2) guest we need to insert a pte into this "shadow_pgtable" for the L2 -> L0 mapping. To do this we need to: 1. Walk the pgtable in L1 memory to find the L2 -> L1 mapping, and provide a page fault to L1 if this mapping doesn't exist. 2. Use our L1 -> L0 pgtable to convert this L1 address to an L0 address, or try to insert a pte for that mapping if it doesn't exist. 3. Now we have a L2 -> L0 mapping, insert this into our shadow_pgtable Once this mapping exists we can take rc faults when hardware is unable to automatically set the reference and change bits in the pte. On these we need to: 1. Check the rc bits on the L2 -> L1 pte match, and otherwise reflect the fault down to L1. 2. Set the rc bits in the L1 -> L0 pte which corresponds to the same host page. 3. Set the rc bits in the L2 -> L0 pte. As we reuse a large number of functions in book3s_64_mmu_radix.c for this we also needed to refactor a number of these functions to take an lpid parameter so that the correct lpid is used for tlb invalidations. The functionality however has remained the same. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
360cae31 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Nested guest entry via hypercall This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8e3f5fc1 |
|
07-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization This starts the process of adding the code to support nested HV-style virtualization. It defines a new H_SET_PARTITION_TABLE hypercall which a nested hypervisor can use to set the base address and size of a partition table in its memory (analogous to the PTCR register). On the host (level 0 hypervisor) side, the H_SET_PARTITION_TABLE hypercall from the guest is handled by code that saves the virtual PTCR value for the guest. This also adds code for creating and destroying nested guests and for reading the partition table entry for a nested guest from L1 memory. Each nested guest has its own shadow LPID value, different in general from the LPID value used by the nested hypervisor to refer to it. The shadow LPID value is allocated at nested guest creation time. Nested hypervisor functionality is only available for a radix guest, which therefore means a radix host on a POWER9 (or later) processor. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|