150477Speter.. SPDX-License-Identifier: GPL-2.0 235388Smjacob 339235Sgibbs==================================== 435388SmjacobNested KVM on POWER 535388Smjacob==================================== 648486Smjacob 735388SmjacobIntroduction 835388Smjacob============ 935388Smjacob 1035388SmjacobThis document explains how a guest operating system can act as a 1135388Smjacobhypervisor and run nested guests through the use of hypercalls, if the 1235388Smjacobhypervisor has implemented them. The terms L0, L1, and L2 are used to 1335388Smjacobrefer to different software entities. L0 is the hypervisor mode entity 1435388Smjacobthat would normally be called the "host" or "hypervisor". L1 is a 1535388Smjacobguest virtual machine that is directly run under L0 and is initiated 1635388Smjacoband controlled by L0. L2 is a guest virtual machine that is initiated 1735388Smjacoband controlled by L1 acting as a hypervisor. 1835388Smjacob 1935388SmjacobExisting API 2035388Smjacob============ 2135388Smjacob 2235388SmjacobLinux/KVM has had support for Nesting as an L0 or L1 since 2018 2335388Smjacob 2435388SmjacobThe L0 code was added:: 2535388Smjacob 2635388Smjacob commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce 2735388Smjacob Author: Paul Mackerras <paulus@ozlabs.org> 2835388Smjacob Date: Mon Oct 8 16:31:03 2018 +1100 2935388Smjacob KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization 3035388Smjacob 3135388SmjacobThe L1 code was added:: 3235388Smjacob 3335388Smjacob commit 360cae313702cdd0b90f82c261a8302fecef030a 3435388Smjacob Author: Paul Mackerras <paulus@ozlabs.org> 3535388Smjacob Date: Mon Oct 8 16:31:04 2018 +1100 3635388Smjacob KVM: PPC: Book3S HV: Nested guest entry via hypercall 3735388Smjacob 3835388SmjacobThis API works primarily using a single hcall h_enter_nested(). This 3935388Smjacobcall made by the L1 to tell the L0 to start an L2 vCPU with the given 4035388Smjacobstate. The L0 then starts this L2 and runs until an L2 exit condition 4135388Smjacobis reached. Once the L2 exits, the state of the L2 is given back to 4235388Smjacobthe L1 by the L0. The full L2 vCPU state is always transferred from 4335388Smjacoband to L1 when the L2 is run. The L0 doesn't keep any state on the L2 4435388SmjacobvCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2 4535388Smjacob-> L1 exit). 4635388Smjacob 4735388SmjacobThe only state kept by the L0 is the partition table. The L1 registers 4835388Smjacobit's partition table using the h_set_partition_table() hcall. All 4935388Smjacobother state held by the L0 about the L2s is cached state (such as 5035388Smjacobshadow page tables). 5144819Smjacob 5244819SmjacobThe L1 may run any L2 or vCPU without first informing the L0. It 5344819Smjacobsimply starts the vCPU using h_enter_nested(). The creation of L2s and 5435388SmjacobvCPUs is done implicitly whenever h_enter_nested() is called. 5542131Smjacob 5635388SmjacobIn this document, we call this existing API the v1 API. 5735388Smjacob 5835388SmjacobNew PAPR API 5935388Smjacob=============== 6035388Smjacob 6135388SmjacobThe new PAPR API changes from the v1 API such that the creating L2 and 6235388Smjacobassociated vCPUs is explicit. In this document, we call this the v2 6335388SmjacobAPI. 6435388Smjacob 6539235Sgibbsh_enter_nested() is replaced with H_GUEST_VCPU_RUN(). Before this can 6635388Smjacobbe called the L1 must explicitly create the L2 using h_guest_create() 6739235Sgibbsand any associated vCPUs() created with h_guest_create_vCPU(). Getting 6839235Sgibbsand setting vCPU state can also be performed using h_guest_{g|s}et 6939235Sgibbshcall. 7039235Sgibbs 7141524SmjacobThe basic execution flow is for an L1 to create an L2, run it, and 7239235Sgibbsdelete it is: 7353490Smjacob 7435388Smjacob- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES() 7535388Smjacob (normally at L1 boot time). 7652350Smjacob 7746971Smjacob- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token 7846971Smjacob 7935388Smjacob- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU() 8044819Smjacob 8144819Smjacob- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall 8248486Smjacob 8345040Smjacob- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall 8448486Smjacob 8548486Smjacob- L1 deletes L2 with H_GUEST_DELETE() 8648486Smjacob 8748486SmjacobMore details of the individual hcalls follows: 8848486Smjacob 8935388SmjacobHCALL Details 9035388Smjacob============= 9135388Smjacob 9235388SmjacobThis documentation is provided to give an overall understating of the 9346971SmjacobAPI. It doesn't aim to provide all the details required to implement 9446971Smjacoban L1 or L0. Latest version of PAPR can be referred to for more details. 9546971Smjacob 9639235SgibbsAll these HCALLs are made by the L1 to the L0. 9739235Sgibbs 9857146SmjacobH_GUEST_GET_CAPABILITIES() 9957146Smjacob-------------------------- 10057146Smjacob 10157146SmjacobThis is called to get the capabilities of the L0 nested 10239235Sgibbshypervisor. This includes capabilities such the CPU versions (eg 10357146SmjacobPOWER9, POWER10) that are supported as L2s:: 10435388Smjacob 10535388Smjacob H_GUEST_GET_CAPABILITIES(uint64 flags) 10639235Sgibbs 10752682Smjacob Parameters: 10839235Sgibbs Input: 10939235Sgibbs flags: Reserved 11035388Smjacob Output: 11135388Smjacob R3: Return code 11235388Smjacob R4: Hypervisor Supported Capabilities bitmap 1 11335388Smjacob 11435388SmjacobH_GUEST_SET_CAPABILITIES() 11535388Smjacob-------------------------- 11643420Smjacob 11735388SmjacobThis is called to inform the L0 of the capabilities of the L1 11835388Smjacobhypervisor. The set of flags passed here are the same as 11935388SmjacobH_GUEST_GET_CAPABILITIES() 12035388Smjacob 12135388SmjacobTypically, GET will be called first and then SET will be called with a 12239235Sgibbssubset of the flags returned from GET. This process allows the L0 and 12339235SgibbsL1 to negotiate an agreed set of capabilities:: 12439235Sgibbs 12535388Smjacob H_GUEST_SET_CAPABILITIES(uint64 flags, 12635388Smjacob uint64 capabilitiesBitmap1) 12743420Smjacob Parameters: 12843420Smjacob Input: 12943789Smjacob flags: Reserved 13043789Smjacob capabilitiesBitmap1: Only capabilities advertised through 13143789Smjacob H_GUEST_GET_CAPABILITIES 13243789Smjacob Output: 13343789Smjacob R3: Return code 13443420Smjacob R4: If R3 = H_P2: The number of invalid bitmaps 13543420Smjacob R5: If R3 = H_P2: The index of first invalid bitmap 13643420Smjacob 13743420SmjacobH_GUEST_CREATE() 13843420Smjacob---------------- 13943420Smjacob 14043420SmjacobThis is called to create an L2. A unique ID of the L2 created 14153490Smjacob(similar to an LPID) is returned, which can be used on subsequent HCALLs to 14243420Smjacobidentify the L2:: 14343420Smjacob 14443420Smjacob H_GUEST_CREATE(uint64 flags, 14542462Smjacob uint64 continueToken); 14643420Smjacob Parameters: 14743420Smjacob Input: 14845287Smjacob flags: Reserved 14945287Smjacob continueToken: Initial call set to -1. Subsequent calls, 15045287Smjacob after H_Busy or H_LongBusyOrder has been 15145287Smjacob returned, value that was returned in R4. 15245287Smjacob Output: 15345287Smjacob R3: Return code. Notable: 15445287Smjacob H_Not_Enough_Resources: Unable to create Guest VCPU due to not 15545287Smjacob enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags = 15642462Smjacob takeOwnershipOfVcpuState) 15753490Smjacob R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken 15842462Smjacob 15942462SmjacobH_GUEST_CREATE_VCPU() 16045287Smjacob--------------------- 16145287Smjacob 16243420SmjacobThis is called to create a vCPU associated with an L2. The L2 id 16354671Smjacob(returned from H_GUEST_CREATE()) should be passed it. Also passed in 16443420Smjacobis a unique (for this L2) vCPUid. This vCPUid is allocated by the 16543420SmjacobL1:: 16643420Smjacob 16748486Smjacob H_GUEST_CREATE_VCPU(uint64 flags, 16848486Smjacob uint64 guestId, 16948486Smjacob uint64 vcpuId); 17048486Smjacob Parameters: 17148486Smjacob Input: 17248486Smjacob flags: Reserved 17348486Smjacob guestId: ID obtained from H_GUEST_CREATE 17448486Smjacob vcpuId: ID of the vCPU to be created. This must be within the 17548486Smjacob range of 0 to 2047 17648486Smjacob Output: 17748486Smjacob R3: Return code. Notable: 17848486Smjacob H_Not_Enough_Resources: Unable to create Guest VCPU due to not 17954671Smjacob enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags = 18054671Smjacob takeOwnershipOfVcpuState) 18154671Smjacob 18246971SmjacobH_GUEST_GET_STATE() 18354671Smjacob------------------- 18454671Smjacob 18554671SmjacobThis is called to get state associated with an L2 (Guest-wide or vCPU specific). 18654671SmjacobThis info is passed via the Guest State Buffer (GSB), a standard format as 18754671Smjacobexplained later in this doc, necessary details below: 18854671Smjacob 18954671SmjacobThis can get either L2 wide or vcpu specific information. Examples of 19054671SmjacobL2 wide is the timebase offset or process scoped page table 19145040Smjacobinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags 19244819Smjacobparameter specifies if this call is L2 wide or vCPU specific and the 19354671SmjacobIDs in the GSB must match this. 19454671Smjacob 19554671SmjacobThe L1 provides a pointer to the GSB as a parameter to this call. Also 19657146Smjacobprovided is the L2 and vCPU IDs associated with the state to set. 19757146Smjacob 19857146SmjacobThe L1 writes only the IDs and sizes in the GSB. L0 writes the 19957146Smjacobassociated values for each ID in the GSB:: 20057146Smjacob 20157146Smjacob H_GUEST_GET_STATE(uint64 flags, 20257146Smjacob uint64 guestId, 20357146Smjacob uint64 vcpuId, 20457146Smjacob uint64 dataBuffer, 20545040Smjacob uint64 dataBufferSizeInBytes); 20645040Smjacob Parameters: 20745040Smjacob Input: 20845040Smjacob flags: 20954671Smjacob Bit 0: getGuestWideState: Request state of the Guest instead 21045040Smjacob of an individual VCPU. 21145040Smjacob Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking 21245040Smjacob over ownership of the VCPU state and that the L0 can free 21354671Smjacob the storage holding the state. The VCPU state will need to 21445040Smjacob be returned to the Hypervisor via H_GUEST_SET_STATE prior 21545040Smjacob to H_GUEST_RUN_VCPU being called for this VCPU. The data 21645040Smjacob returned in the dataBuffer is in a Hypervisor internal 21754671Smjacob format. 21845040Smjacob Bits 2-63: Reserved 21945040Smjacob guestId: ID obtained from H_GUEST_CREATE 22054671Smjacob vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 22154671Smjacob dataBuffer: A L1 real address of the GSB. 22245040Smjacob If takeOwnershipOfVcpuState, size must be at least the size 22345040Smjacob returned by ID=0x0001 22454671Smjacob dataBufferSizeInBytes: Size of dataBuffer 22557146Smjacob Output: 22654671Smjacob R3: Return code 22754671Smjacob R4: If R3 = H_Invalid_Element_Id: The array index of the bad 22854671Smjacob element ID. 22954671Smjacob If R3 = H_Invalid_Element_Size: The array index of the bad 23054671Smjacob element size. 23154671Smjacob If R3 = H_Invalid_Element_Value: The array index of the bad 23254671Smjacob element value. 23354671Smjacob 23454671SmjacobH_GUEST_SET_STATE() 23554671Smjacob------------------- 23654671Smjacob 23754671SmjacobThis is called to set L2 wide or vCPU specific L2 state. This info is 23854671Smjacobpassed via the Guest State Buffer (GSB), necessary details below: 23954671Smjacob 24054671SmjacobThis can set either L2 wide or vcpu specific information. Examples of 24154671SmjacobL2 wide is the timebase offset or process scoped page table 24254671Smjacobinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags 24354671Smjacobparameter specifies if this call is L2 wide or vCPU specific and the 24454671SmjacobIDs in the GSB must match this. 24554671Smjacob 24654671SmjacobThe L1 provides a pointer to the GSB as a parameter to this call. Also 24754671Smjacobprovided is the L2 and vCPU IDs associated with the state to set. 24835388Smjacob 24939235SgibbsThe L1 writes all values in the GSB and the L0 only reads the GSB for 25042462Smjacobthis call:: 25142462Smjacob 25235388Smjacob H_GUEST_SET_STATE(uint64 flags, 25339235Sgibbs uint64 guestId, 25442462Smjacob uint64 vcpuId, 25539235Sgibbs uint64 dataBuffer, 25639235Sgibbs uint64 dataBufferSizeInBytes); 25745040Smjacob Parameters: 25835388Smjacob Input: 25946971Smjacob flags: 26035388Smjacob Bit 0: getGuestWideState: Request state of the Guest instead 26135388Smjacob of an individual VCPU. 26239235Sgibbs Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See 26339235Sgibbs GET_STATE takeOwnershipOfVcpuState 26439235Sgibbs Bits 2-63: Reserved 26539235Sgibbs guestId: ID obtained from H_GUEST_CREATE 26639235Sgibbs vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 26745040Smjacob dataBuffer: A L1 real address of the GSB. 26839235Sgibbs If takeOwnershipOfVcpuState, size must be at least the size 26946971Smjacob returned by ID=0x0001 27035388Smjacob dataBufferSizeInBytes: Size of dataBuffer 27135388Smjacob Output: 27239235Sgibbs R3: Return code 27339235Sgibbs R4: If R3 = H_Invalid_Element_Id: The array index of the bad 27446971Smjacob element ID. 27539235Sgibbs If R3 = H_Invalid_Element_Size: The array index of the bad 27639235Sgibbs element size. 27735388Smjacob If R3 = H_Invalid_Element_Value: The array index of the bad 27835388Smjacob element value. 27946971Smjacob 28035388SmjacobH_GUEST_RUN_VCPU() 28135388Smjacob------------------ 28235388Smjacob 28335388SmjacobThis is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as 28446971Smjacobparameters. The vCPU runs with the state set previously using 28535388SmjacobH_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this 28645287Smjacobhcall. 28754671Smjacob 28845287SmjacobThis hcall also has associated input and output GSBs. Unlike 28946971SmjacobH_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as 29045287Smjacobparameters to the hcall (This was done in the interest of 29135388Smjacobperformance). The locations of these GSBs must be preregistered using 29239235Sgibbsthe H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table 29343420Smjacobbelow). 29443420Smjacob 29539235SgibbsThe input GSB may contain only VCPU specific elements to be set. This 29643420SmjacobGSB may also contain zero elements (ie 0 in the first 4 bytes of the 29754057SmjacobGSB) if nothing needs to be set. 29843420Smjacob 29942462SmjacobOn exit from the hcall, the output buffer is filled with elements 30043420Smjacobdetermined by the L0. The reason for the exit is contained in GPR4 (ie 30139235SgibbsNIP is put in GPR4). The elements returned depend on the exit 30243420Smjacobtype. For example, if the exit reason is the L2 doing a hcall (GPR4 = 30343420Smjacob0xc00), then GPR3-12 are provided in the output GSB as this is the 30443420Smjacobstate likely needed to service the hcall. If additional state is 30543420Smjacobneeded, H_GUEST_GET_STATE() may be called by the L1. 30643420Smjacob 30743420SmjacobTo synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU() 30843420Smjacobthe L1 may set a flag (as a hcall parameter) and the L0 will 30954057Smjacobsynthesize the interrupt in the L2. Alternatively, the L1 may 31043420Smjacobsynthesize the interrupt itself using H_GUEST_SET_STATE() or the 31144819SmjacobH_GUEST_RUN_VCPU() input GSB to set the state appropriately:: 31244819Smjacob 31344819Smjacob H_GUEST_RUN_VCPU(uint64 flags, 31444819Smjacob uint64 guestId, 31546971Smjacob uint64 vcpuId, 31645040Smjacob uint64 dataBuffer, 31743420Smjacob uint64 dataBufferSizeInBytes); 31844819Smjacob Parameters: 31944819Smjacob Input: 32044819Smjacob flags: 32143420Smjacob Bit 0: generateExternalInterrupt: Generate an external interrupt 32239235Sgibbs Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell 32339235Sgibbs Bit 2: sendToSystemReset���: Generate a System Reset Interrupt 32439235Sgibbs Bits 3-63: Reserved 32539235Sgibbs guestId: ID obtained from H_GUEST_CREATE 32639235Sgibbs vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 32739235Sgibbs Output: 32846971Smjacob R3: Return code 32946971Smjacob R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA) 33039235Sgibbs 0x000: The VCPU stopped running for an unspecified reason. An 33139235Sgibbs example of this is the Hypervisor stopping a VCPU running 33243420Smjacob due to an outstanding interrupt for the Host Partition. 33335388Smjacob 0x980: HDEC 33435388Smjacob 0xC00: HCALL 33535388Smjacob 0xE00: HDSI 33635388Smjacob 0xE20: HISI 33735388Smjacob 0xE40: HEA 33835388Smjacob 0xF80: HV Fac Unavail 33935388Smjacob If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or 34043420Smjacob H_Invalid_Element_Value: R4 is offset of the invalid element 34139235Sgibbs in the input buffer. 34235388Smjacob 34335388SmjacobH_GUEST_DELETE() 34435388Smjacob---------------- 34535388Smjacob 34635388SmjacobThis is called to delete an L2. All associated vCPUs are also 34744819Smjacobdeleted. No specific vCPU delete call is provided. 34835388Smjacob 34935388SmjacobA flag may be provided to delete all guests. This is used to reset the 35035388SmjacobL0 in the case of kdump/kexec:: 35135388Smjacob 35235388Smjacob H_GUEST_DELETE(uint64 flags, 35335388Smjacob uint64 guestId) 35444819Smjacob Parameters: 35544819Smjacob Input: 35644819Smjacob flags: 35744819Smjacob Bit 0: deleteAllGuests: deletes all guests 35844819Smjacob Bits 1-63: Reserved 35944819Smjacob guestId: ID obtained from H_GUEST_CREATE 36044819Smjacob Output: 36144819Smjacob R3: Return code 36235388Smjacob 36335388SmjacobGuest State Buffer 36435388Smjacob================== 36535388Smjacob 36645040SmjacobThe Guest State Buffer (GSB) is the main method of communicating state 36735388Smjacobabout the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and 36845040SmjacobH_GUEST_VCPU_RUN() calls. 36944819Smjacob 37044819SmjacobState may be associated with a whole L2 (eg timebase offset) or a 37135388Smjacobspecific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by 37235388SmjacobH_GUEST_VCPU_RUN(). 37335388Smjacob 37435388SmjacobAll data in the GSB is big endian (as is standard in PAPR) 37535388Smjacob 37635388SmjacobThe Guest state buffer has a header which gives the number of 37743420Smjacobelements, followed by the GSB elements themselves. 37843420Smjacob 37943420SmjacobGSB header: 38043420Smjacob 38135388Smjacob+----------+----------+-------------------------------------------+ 38235388Smjacob| Offset | Size | Purpose | 38335388Smjacob| Bytes | Bytes | | 38435388Smjacob+==========+==========+===========================================+ 38535388Smjacob| 0 | 4 | Number of elements | 38635388Smjacob+----------+----------+-------------------------------------------+ 38735388Smjacob| 4 | | Guest state buffer elements | 38835388Smjacob+----------+----------+-------------------------------------------+ 38935388Smjacob 39035388SmjacobGSB element: 39135388Smjacob 39235388Smjacob+----------+----------+-------------------------------------------+ 39335388Smjacob| Offset | Size | Purpose | 39449907Smjacob| Bytes | Bytes | | 39535388Smjacob+==========+==========+===========================================+ 39635388Smjacob| 0 | 2 | ID | 39735388Smjacob+----------+----------+-------------------------------------------+ 39835388Smjacob| 2 | 2 | Size of Value | 39935388Smjacob+----------+----------+-------------------------------------------+ 40035388Smjacob| 4 | As above | Value | 40135388Smjacob+----------+----------+-------------------------------------------+ 40235388Smjacob 40335388SmjacobThe ID in the GSB element specifies what is to be set. This includes 40435388Smjacobarchtected state like GPRs, VSRs, SPRs, plus also some meta data about 40535388Smjacobthe partition like the timebase offset and partition scoped page 40635388Smjacobtable information. 40743420Smjacob 40835388Smjacob+--------+-------+----+--------+----------------------------------+ 40943420Smjacob| ID | Size | RW | Thread | Details | 41043420Smjacob| | Bytes | | Guest | | 41135388Smjacob| | | | Scope | | 41243420Smjacob+========+=======+====+========+==================================+ 41344819Smjacob| 0x0000 | | RW | TG | NOP element | 41435388Smjacob+--------+-------+----+--------+----------------------------------+ 41535388Smjacob| 0x0001 | 0x08 | R | G | Size of L0 vCPU state. See: | 41635388Smjacob| | | | | H_GUEST_GET_STATE: | 41735388Smjacob| | | | | flags = takeOwnershipOfVcpuState | 41835388Smjacob+--------+-------+----+--------+----------------------------------+ 41943420Smjacob| 0x0002 | 0x08 | R | G | Size Run vCPU out buffer | 42043420Smjacob+--------+-------+----+--------+----------------------------------+ 42143420Smjacob| 0x0003 | 0x04 | RW | G | Logical PVR | 42235388Smjacob+--------+-------+----+--------+----------------------------------+ 42335388Smjacob| 0x0004 | 0x08 | RW | G | TB Offset (L1 relative) | 42435388Smjacob+--------+-------+----+--------+----------------------------------+ 42542462Smjacob| 0x0005 | 0x18 | RW | G |Partition scoped page tbl info: | 42643420Smjacob| | | | | | 42742462Smjacob| | | | |- 0x00 Addr part scope table | 42842462Smjacob| | | | |- 0x08 Num addr bits | 42942462Smjacob| | | | |- 0x10 Size root dir | 43044819Smjacob+--------+-------+----+--------+----------------------------------+ 43143420Smjacob| 0x0006 | 0x10 | RW | G |Process Table Information: | 43242462Smjacob| | | | | | 43342462Smjacob| | | | |- 0x0 Addr proc scope table | 43442462Smjacob| | | | |- 0x8 Table size. | 43542462Smjacob+--------+-------+----+--------+----------------------------------+ 43643420Smjacob| 0x0007-| | | | Reserved | 43742462Smjacob| 0x0BFF | | | | | 43843420Smjacob+--------+-------+----+--------+----------------------------------+ 43943420Smjacob| 0x0C00 | 0x10 | RW | T |Run vCPU Input Buffer: | 44035388Smjacob| | | | | | 44135388Smjacob| | | | |- 0x0 Addr of buffer | 44235388Smjacob| | | | |- 0x8 Buffer Size. | 44343420Smjacob+--------+-------+----+--------+----------------------------------+ 44443420Smjacob| 0x0C01 | 0x10 | RW | T |Run vCPU Output Buffer: | 44545040Smjacob| | | | | | 44643420Smjacob| | | | |- 0x0 Addr of buffer | 44743420Smjacob| | | | |- 0x8 Buffer Size. | 44843420Smjacob+--------+-------+----+--------+----------------------------------+ 44945040Smjacob| 0x0C02 | 0x08 | RW | T | vCPU VPA Address | 45043420Smjacob+--------+-------+----+--------+----------------------------------+ 45143420Smjacob| 0x0C03-| | | | Reserved | 45243420Smjacob| 0x0FFF | | | | | 45343420Smjacob+--------+-------+----+--------+----------------------------------+ 45443420Smjacob| 0x1000-| 0x08 | RW | T | GPR 0-31 | 45543420Smjacob| 0x101F | | | | | 45643420Smjacob+--------+-------+----+--------+----------------------------------+ 45743420Smjacob| 0x1020 | 0x08 | T | T | HDEC expiry TB | 45843420Smjacob+--------+-------+----+--------+----------------------------------+ 45943420Smjacob| 0x1021 | 0x08 | RW | T | NIA | 46043420Smjacob+--------+-------+----+--------+----------------------------------+ 46135388Smjacob| 0x1022 | 0x08 | RW | T | MSR | 46243420Smjacob+--------+-------+----+--------+----------------------------------+ 46335388Smjacob| 0x1023 | 0x08 | RW | T | LR | 46435388Smjacob+--------+-------+----+--------+----------------------------------+ 46535388Smjacob| 0x1024 | 0x08 | RW | T | XER | 46635388Smjacob+--------+-------+----+--------+----------------------------------+ 46735388Smjacob| 0x1025 | 0x08 | RW | T | CTR | 46835388Smjacob+--------+-------+----+--------+----------------------------------+ 46935388Smjacob| 0x1026 | 0x08 | RW | T | CFAR | 47035388Smjacob+--------+-------+----+--------+----------------------------------+ 47143420Smjacob| 0x1027 | 0x08 | RW | T | SRR0 | 47243420Smjacob+--------+-------+----+--------+----------------------------------+ 47343420Smjacob| 0x1028 | 0x08 | RW | T | SRR1 | 47443420Smjacob+--------+-------+----+--------+----------------------------------+ 47543420Smjacob| 0x1029 | 0x08 | RW | T | DAR | 47643420Smjacob+--------+-------+----+--------+----------------------------------+ 47743420Smjacob| 0x102A | 0x08 | RW | T | DEC expiry TB | 47843420Smjacob+--------+-------+----+--------+----------------------------------+ 47943420Smjacob| 0x102B | 0x08 | RW | T | VTB | 48043420Smjacob+--------+-------+----+--------+----------------------------------+ 48143420Smjacob| 0x102C | 0x08 | RW | T | LPCR | 48243420Smjacob+--------+-------+----+--------+----------------------------------+ 48343420Smjacob| 0x102D | 0x08 | RW | T | HFSCR | 48443420Smjacob+--------+-------+----+--------+----------------------------------+ 48543420Smjacob| 0x102E | 0x08 | RW | T | FSCR | 48643420Smjacob+--------+-------+----+--------+----------------------------------+ 48743420Smjacob| 0x102F | 0x08 | RW | T | FPSCR | 48843420Smjacob+--------+-------+----+--------+----------------------------------+ 48943420Smjacob| 0x1030 | 0x08 | RW | T | DAWR0 | 49035388Smjacob+--------+-------+----+--------+----------------------------------+ 49135388Smjacob| 0x1031 | 0x08 | RW | T | DAWR1 | 49235388Smjacob+--------+-------+----+--------+----------------------------------+ 49335388Smjacob| 0x1032 | 0x08 | RW | T | CIABR | 49435388Smjacob+--------+-------+----+--------+----------------------------------+ 49535388Smjacob| 0x1033 | 0x08 | RW | T | PURR | 49635388Smjacob+--------+-------+----+--------+----------------------------------+ 49735388Smjacob| 0x1034 | 0x08 | RW | T | SPURR | 49835388Smjacob+--------+-------+----+--------+----------------------------------+ 49948486Smjacob| 0x1035 | 0x08 | RW | T | IC | 50035388Smjacob+--------+-------+----+--------+----------------------------------+ 50135388Smjacob| 0x1036-| 0x08 | RW | T | SPRG 0-3 | 50235388Smjacob| 0x1039 | | | | | 50335388Smjacob+--------+-------+----+--------+----------------------------------+ 50435388Smjacob| 0x103A | 0x08 | W | T | PPR | 50535388Smjacob+--------+-------+----+--------+----------------------------------+ 50635388Smjacob| 0x103B | 0x08 | RW | T | MMCR 0-3 | 50735388Smjacob| 0x103E | | | | | 50835388Smjacob+--------+-------+----+--------+----------------------------------+ 50935388Smjacob| 0x103F | 0x08 | RW | T | MMCRA | 51035388Smjacob+--------+-------+----+--------+----------------------------------+ 51135388Smjacob| 0x1040 | 0x08 | RW | T | SIER | 51235388Smjacob+--------+-------+----+--------+----------------------------------+ 51335388Smjacob| 0x1041 | 0x08 | RW | T | SIER 2 | 51435388Smjacob+--------+-------+----+--------+----------------------------------+ 51535388Smjacob| 0x1042 | 0x08 | RW | T | SIER 3 | 51635388Smjacob+--------+-------+----+--------+----------------------------------+ 51735388Smjacob| 0x1043 | 0x08 | RW | T | BESCR | 51835388Smjacob+--------+-------+----+--------+----------------------------------+ 51935388Smjacob| 0x1044 | 0x08 | RW | T | EBBHR | 52035388Smjacob+--------+-------+----+--------+----------------------------------+ 52135388Smjacob| 0x1045 | 0x08 | RW | T | EBBRR | 52235388Smjacob+--------+-------+----+--------+----------------------------------+ 52335388Smjacob| 0x1046 | 0x08 | RW | T | AMR | 52435388Smjacob+--------+-------+----+--------+----------------------------------+ 52535388Smjacob| 0x1047 | 0x08 | RW | T | IAMR | 52635388Smjacob+--------+-------+----+--------+----------------------------------+ 52735388Smjacob| 0x1048 | 0x08 | RW | T | AMOR | 52835388Smjacob+--------+-------+----+--------+----------------------------------+ 52935388Smjacob| 0x1049 | 0x08 | RW | T | UAMOR | 53052733Smjacob+--------+-------+----+--------+----------------------------------+ 53135388Smjacob| 0x104A | 0x08 | RW | T | SDAR | 53235388Smjacob+--------+-------+----+--------+----------------------------------+ 53335388Smjacob| 0x104B | 0x08 | RW | T | SIAR | 53435388Smjacob+--------+-------+----+--------+----------------------------------+ 53552733Smjacob| 0x104C | 0x08 | RW | T | DSCR | 53652682Smjacob+--------+-------+----+--------+----------------------------------+ 53752682Smjacob| 0x104D | 0x08 | RW | T | TAR | 53852682Smjacob+--------+-------+----+--------+----------------------------------+ 53952682Smjacob| 0x104E | 0x08 | RW | T | DEXCR | 54035388Smjacob+--------+-------+----+--------+----------------------------------+ 54135388Smjacob| 0x104F | 0x08 | RW | T | HDEXCR | 54235388Smjacob+--------+-------+----+--------+----------------------------------+ 54335388Smjacob| 0x1050 | 0x08 | RW | T | HASHKEYR | 54435388Smjacob+--------+-------+----+--------+----------------------------------+ 54543420Smjacob| 0x1051 | 0x08 | RW | T | HASHPKEYR | 54643420Smjacob+--------+-------+----+--------+----------------------------------+ 54743420Smjacob| 0x1052 | 0x08 | RW | T | CTRL | 54843420Smjacob+--------+-------+----+--------+----------------------------------+ 54935388Smjacob| 0x1053-| | | | Reserved | 55035388Smjacob| 0x1FFF | | | | | 55135388Smjacob+--------+-------+----+--------+----------------------------------+ 55243420Smjacob| 0x2000 | 0x04 | RW | T | CR | 55343420Smjacob+--------+-------+----+--------+----------------------------------+ 55443420Smjacob| 0x2001 | 0x04 | RW | T | PIDR | 55543420Smjacob+--------+-------+----+--------+----------------------------------+ 55643420Smjacob| 0x2002 | 0x04 | RW | T | DSISR | 55743420Smjacob+--------+-------+----+--------+----------------------------------+ 55843420Smjacob| 0x2003 | 0x04 | RW | T | VSCR | 55943420Smjacob+--------+-------+----+--------+----------------------------------+ 56043420Smjacob| 0x2004 | 0x04 | RW | T | VRSAVE | 56135388Smjacob+--------+-------+----+--------+----------------------------------+ 56235388Smjacob| 0x2005 | 0x04 | RW | T | DAWRX0 | 56335388Smjacob+--------+-------+----+--------+----------------------------------+ 56435388Smjacob| 0x2006 | 0x04 | RW | T | DAWRX1 | 56535388Smjacob+--------+-------+----+--------+----------------------------------+ 56635388Smjacob| 0x2007-| 0x04 | RW | T | PMC 1-6 | 56735388Smjacob| 0x200c | | | | | 56835388Smjacob+--------+-------+----+--------+----------------------------------+ 56935388Smjacob| 0x200D | 0x04 | RW | T | WORT | 57035388Smjacob+--------+-------+----+--------+----------------------------------+ 57135388Smjacob| 0x200E | 0x04 | RW | T | PSPB | 57235388Smjacob+--------+-------+----+--------+----------------------------------+ 57335388Smjacob| 0x200F-| | | | Reserved | 57443420Smjacob| 0x2FFF | | | | | 57543420Smjacob+--------+-------+----+--------+----------------------------------+ 57643420Smjacob| 0x3000-| 0x10 | RW | T | VSR 0-63 | 57743420Smjacob| 0x303F | | | | | 57835388Smjacob+--------+-------+----+--------+----------------------------------+ 57935388Smjacob| 0x3040-| | | | Reserved | 58049907Smjacob| 0xEFFF | | | | | 58135388Smjacob+--------+-------+----+--------+----------------------------------+ 58239235Sgibbs| 0xF000 | 0x08 | R | T | HDAR | 58335388Smjacob+--------+-------+----+--------+----------------------------------+ 58446971Smjacob| 0xF001 | 0x04 | R | T | HDSISR | 58535388Smjacob+--------+-------+----+--------+----------------------------------+ 58646971Smjacob| 0xF002 | 0x04 | R | T | HEIR | 58735388Smjacob+--------+-------+----+--------+----------------------------------+ 58835388Smjacob| 0xF003 | 0x08 | R | T | ASDR | 58952350Smjacob+--------+-------+----+--------+----------------------------------+ 59052350Smjacob 59139235Sgibbs 59235388SmjacobMiscellaneous info 59335388Smjacob================== 59435388Smjacob 59535388SmjacobState not in ptregs/hvregs 59635388Smjacob-------------------------- 59735388Smjacob 59852350SmjacobIn the v1 API, some state is not in the ptregs/hvstate. This includes 59935388Smjacobthe vector register and some SPRs. For the L1 to set this state for 60035388Smjacobthe L2, the L1 loads up these hardware registers before the 60152350Smjacobh_enter_nested() call and the L0 ensures they end up as the L2 state 60252350Smjacob(by not touching them). 60352350Smjacob 60445040SmjacobThe v2 API removes this and explicitly sets this state via the GSB. 60543789Smjacob 60652350SmjacobL1 Implementation details: Caching state 60743789Smjacob---------------------------------------- 60843789Smjacob 60952350SmjacobIn the v1 API, all state is sent from the L1 to the L0 and vice versa 61045287Smjacobon every h_enter_nested() hcall. If the L0 is not currently running 61145287Smjacobany L2s, the L0 has no state information about them. The only 61245287Smjacobexception to this is the location of the partition table, registered 61345287Smjacobvia h_set_partition_table(). 61445287Smjacob 61552350SmjacobThe v2 API changes this so that the L0 retains the L2 state even when 61645287Smjacobit's vCPUs are no longer running. This means that the L1 only needs to 61745287Smjacobcommunicate with the L0 about L2 state when it needs to modify the L2 61839235Sgibbsstate, or when it's value is out of date. This provides an opportunity 61952350Smjacobfor performance optimisation. 62052350Smjacob 62152350SmjacobWhen a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally 62252350Smjacobmarks all L2 state as invalid. This means that if the L1 wants to know 62352350Smjacobthe L2 state (say via a kvm_get_one_reg() call), it needs call 62452350SmjacobH_GUEST_GET_STATE() to get that state. Once it's read, it's marked as 62552350Smjacobvalid in L1 until the L2 is run again. 62652350Smjacob 62752350SmjacobAlso, when an L1 modifies L2 vcpu state, it doesn't need to write it 62852350Smjacobto the L0 until that L2 vcpu runs again. Hence when the L1 updates 62935388Smjacobstate (say via a kvm_set_one_reg() call), it writes to an internal L1 63052350Smjacobcopy and only flushes this copy to the L0 when the L2 runs again via 63146971Smjacobthe H_GUEST_VCPU_RUN() input buffer. 63246971Smjacob 63346971SmjacobThis lazy updating of state by the L1 avoids unnecessary 63446971SmjacobH_GUEST_{G|S}ET_STATE() calls. 63546971Smjacob