150477Speter.. SPDX-License-Identifier: GPL-2.0
235388Smjacob
339235Sgibbs====================================
435388SmjacobNested KVM on POWER
535388Smjacob====================================
648486Smjacob
735388SmjacobIntroduction
835388Smjacob============
935388Smjacob
1035388SmjacobThis document explains how a guest operating system can act as a
1135388Smjacobhypervisor and run nested guests through the use of hypercalls, if the
1235388Smjacobhypervisor has implemented them. The terms L0, L1, and L2 are used to
1335388Smjacobrefer to different software entities. L0 is the hypervisor mode entity
1435388Smjacobthat would normally be called the "host" or "hypervisor". L1 is a
1535388Smjacobguest virtual machine that is directly run under L0 and is initiated
1635388Smjacoband controlled by L0. L2 is a guest virtual machine that is initiated
1735388Smjacoband controlled by L1 acting as a hypervisor.
1835388Smjacob
1935388SmjacobExisting API
2035388Smjacob============
2135388Smjacob
2235388SmjacobLinux/KVM has had support for Nesting as an L0 or L1 since 2018
2335388Smjacob
2435388SmjacobThe L0 code was added::
2535388Smjacob
2635388Smjacob   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
2735388Smjacob   Author: Paul Mackerras <paulus@ozlabs.org>
2835388Smjacob   Date:   Mon Oct 8 16:31:03 2018 +1100
2935388Smjacob   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
3035388Smjacob
3135388SmjacobThe L1 code was added::
3235388Smjacob
3335388Smjacob   commit 360cae313702cdd0b90f82c261a8302fecef030a
3435388Smjacob   Author: Paul Mackerras <paulus@ozlabs.org>
3535388Smjacob   Date:   Mon Oct 8 16:31:04 2018 +1100
3635388Smjacob   KVM: PPC: Book3S HV: Nested guest entry via hypercall
3735388Smjacob
3835388SmjacobThis API works primarily using a single hcall h_enter_nested(). This
3935388Smjacobcall made by the L1 to tell the L0 to start an L2 vCPU with the given
4035388Smjacobstate. The L0 then starts this L2 and runs until an L2 exit condition
4135388Smjacobis reached. Once the L2 exits, the state of the L2 is given back to
4235388Smjacobthe L1 by the L0. The full L2 vCPU state is always transferred from
4335388Smjacoband to L1 when the L2 is run. The L0 doesn't keep any state on the L2
4435388SmjacobvCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
4535388Smjacob-> L1 exit).
4635388Smjacob
4735388SmjacobThe only state kept by the L0 is the partition table. The L1 registers
4835388Smjacobit's partition table using the h_set_partition_table() hcall. All
4935388Smjacobother state held by the L0 about the L2s is cached state (such as
5035388Smjacobshadow page tables).
5144819Smjacob
5244819SmjacobThe L1 may run any L2 or vCPU without first informing the L0. It
5344819Smjacobsimply starts the vCPU using h_enter_nested(). The creation of L2s and
5435388SmjacobvCPUs is done implicitly whenever h_enter_nested() is called.
5542131Smjacob
5635388SmjacobIn this document, we call this existing API the v1 API.
5735388Smjacob
5835388SmjacobNew PAPR API
5935388Smjacob===============
6035388Smjacob
6135388SmjacobThe new PAPR API changes from the v1 API such that the creating L2 and
6235388Smjacobassociated vCPUs is explicit. In this document, we call this the v2
6335388SmjacobAPI.
6435388Smjacob
6539235Sgibbsh_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
6635388Smjacobbe called the L1 must explicitly create the L2 using h_guest_create()
6739235Sgibbsand any associated vCPUs() created with h_guest_create_vCPU(). Getting
6839235Sgibbsand setting vCPU state can also be performed using h_guest_{g|s}et
6939235Sgibbshcall.
7039235Sgibbs
7141524SmjacobThe basic execution flow is for an L1 to create an L2, run it, and
7239235Sgibbsdelete it is:
7353490Smjacob
7435388Smjacob- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
7535388Smjacob  (normally at L1 boot time).
7652350Smjacob
7746971Smjacob- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
7846971Smjacob
7935388Smjacob- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
8044819Smjacob
8144819Smjacob- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
8248486Smjacob
8345040Smjacob- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
8448486Smjacob
8548486Smjacob- L1 deletes L2 with H_GUEST_DELETE()
8648486Smjacob
8748486SmjacobMore details of the individual hcalls follows:
8848486Smjacob
8935388SmjacobHCALL Details
9035388Smjacob=============
9135388Smjacob
9235388SmjacobThis documentation is provided to give an overall understating of the
9346971SmjacobAPI. It doesn't aim to provide all the details required to implement
9446971Smjacoban L1 or L0. Latest version of PAPR can be referred to for more details.
9546971Smjacob
9639235SgibbsAll these HCALLs are made by the L1 to the L0.
9739235Sgibbs
9857146SmjacobH_GUEST_GET_CAPABILITIES()
9957146Smjacob--------------------------
10057146Smjacob
10157146SmjacobThis is called to get the capabilities of the L0 nested
10239235Sgibbshypervisor. This includes capabilities such the CPU versions (eg
10357146SmjacobPOWER9, POWER10) that are supported as L2s::
10435388Smjacob
10535388Smjacob  H_GUEST_GET_CAPABILITIES(uint64 flags)
10639235Sgibbs
10752682Smjacob  Parameters:
10839235Sgibbs    Input:
10939235Sgibbs      flags: Reserved
11035388Smjacob    Output:
11135388Smjacob      R3: Return code
11235388Smjacob      R4: Hypervisor Supported Capabilities bitmap 1
11335388Smjacob
11435388SmjacobH_GUEST_SET_CAPABILITIES()
11535388Smjacob--------------------------
11643420Smjacob
11735388SmjacobThis is called to inform the L0 of the capabilities of the L1
11835388Smjacobhypervisor. The set of flags passed here are the same as
11935388SmjacobH_GUEST_GET_CAPABILITIES()
12035388Smjacob
12135388SmjacobTypically, GET will be called first and then SET will be called with a
12239235Sgibbssubset of the flags returned from GET. This process allows the L0 and
12339235SgibbsL1 to negotiate an agreed set of capabilities::
12439235Sgibbs
12535388Smjacob  H_GUEST_SET_CAPABILITIES(uint64 flags,
12635388Smjacob                           uint64 capabilitiesBitmap1)
12743420Smjacob  Parameters:
12843420Smjacob    Input:
12943789Smjacob      flags: Reserved
13043789Smjacob      capabilitiesBitmap1: Only capabilities advertised through
13143789Smjacob                           H_GUEST_GET_CAPABILITIES
13243789Smjacob    Output:
13343789Smjacob      R3: Return code
13443420Smjacob      R4: If R3 = H_P2: The number of invalid bitmaps
13543420Smjacob      R5: If R3 = H_P2: The index of first invalid bitmap
13643420Smjacob
13743420SmjacobH_GUEST_CREATE()
13843420Smjacob----------------
13943420Smjacob
14043420SmjacobThis is called to create an L2. A unique ID of the L2 created
14153490Smjacob(similar to an LPID) is returned, which can be used on subsequent HCALLs to
14243420Smjacobidentify the L2::
14343420Smjacob
14443420Smjacob  H_GUEST_CREATE(uint64 flags,
14542462Smjacob                 uint64 continueToken);
14643420Smjacob  Parameters:
14743420Smjacob    Input:
14845287Smjacob      flags: Reserved
14945287Smjacob      continueToken: Initial call set to -1. Subsequent calls,
15045287Smjacob                     after H_Busy or H_LongBusyOrder has been
15145287Smjacob                     returned, value that was returned in R4.
15245287Smjacob    Output:
15345287Smjacob      R3: Return code. Notable:
15445287Smjacob        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
15545287Smjacob        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
15642462Smjacob        takeOwnershipOfVcpuState)
15753490Smjacob      R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken
15842462Smjacob
15942462SmjacobH_GUEST_CREATE_VCPU()
16045287Smjacob---------------------
16145287Smjacob
16243420SmjacobThis is called to create a vCPU associated with an L2. The L2 id
16354671Smjacob(returned from H_GUEST_CREATE()) should be passed it. Also passed in
16443420Smjacobis a unique (for this L2) vCPUid. This vCPUid is allocated by the
16543420SmjacobL1::
16643420Smjacob
16748486Smjacob  H_GUEST_CREATE_VCPU(uint64 flags,
16848486Smjacob                      uint64 guestId,
16948486Smjacob                      uint64 vcpuId);
17048486Smjacob  Parameters:
17148486Smjacob    Input:
17248486Smjacob      flags: Reserved
17348486Smjacob      guestId: ID obtained from H_GUEST_CREATE
17448486Smjacob      vcpuId: ID of the vCPU to be created. This must be within the
17548486Smjacob              range of 0 to 2047
17648486Smjacob    Output:
17748486Smjacob      R3: Return code. Notable:
17848486Smjacob        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
17954671Smjacob        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
18054671Smjacob        takeOwnershipOfVcpuState)
18154671Smjacob
18246971SmjacobH_GUEST_GET_STATE()
18354671Smjacob-------------------
18454671Smjacob
18554671SmjacobThis is called to get state associated with an L2 (Guest-wide or vCPU specific).
18654671SmjacobThis info is passed via the Guest State Buffer (GSB), a standard format as
18754671Smjacobexplained later in this doc, necessary details below:
18854671Smjacob
18954671SmjacobThis can get either L2 wide or vcpu specific information. Examples of
19054671SmjacobL2 wide is the timebase offset or process scoped page table
19145040Smjacobinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
19244819Smjacobparameter specifies if this call is L2 wide or vCPU specific and the
19354671SmjacobIDs in the GSB must match this.
19454671Smjacob
19554671SmjacobThe L1 provides a pointer to the GSB as a parameter to this call. Also
19657146Smjacobprovided is the L2 and vCPU IDs associated with the state to set.
19757146Smjacob
19857146SmjacobThe L1 writes only the IDs and sizes in the GSB.  L0 writes the
19957146Smjacobassociated values for each ID in the GSB::
20057146Smjacob
20157146Smjacob  H_GUEST_GET_STATE(uint64 flags,
20257146Smjacob                           uint64 guestId,
20357146Smjacob                           uint64 vcpuId,
20457146Smjacob                           uint64 dataBuffer,
20545040Smjacob                           uint64 dataBufferSizeInBytes);
20645040Smjacob  Parameters:
20745040Smjacob    Input:
20845040Smjacob      flags:
20954671Smjacob         Bit 0: getGuestWideState: Request state of the Guest instead
21045040Smjacob           of an individual VCPU.
21145040Smjacob         Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
21245040Smjacob           over ownership of the VCPU state and that the L0 can free
21354671Smjacob           the storage holding the state. The VCPU state will need to
21445040Smjacob           be returned to the Hypervisor via H_GUEST_SET_STATE prior
21545040Smjacob           to H_GUEST_RUN_VCPU being called for this VCPU. The data
21645040Smjacob           returned in the dataBuffer is in a Hypervisor internal
21754671Smjacob           format.
21845040Smjacob         Bits 2-63: Reserved
21945040Smjacob      guestId: ID obtained from H_GUEST_CREATE
22054671Smjacob      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
22154671Smjacob      dataBuffer: A L1 real address of the GSB.
22245040Smjacob        If takeOwnershipOfVcpuState, size must be at least the size
22345040Smjacob        returned by ID=0x0001
22454671Smjacob      dataBufferSizeInBytes: Size of dataBuffer
22557146Smjacob    Output:
22654671Smjacob      R3: Return code
22754671Smjacob      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
22854671Smjacob            element ID.
22954671Smjacob          If R3 = H_Invalid_Element_Size: The array index of the bad
23054671Smjacob             element size.
23154671Smjacob          If R3 = H_Invalid_Element_Value: The array index of the bad
23254671Smjacob             element value.
23354671Smjacob
23454671SmjacobH_GUEST_SET_STATE()
23554671Smjacob-------------------
23654671Smjacob
23754671SmjacobThis is called to set L2 wide or vCPU specific L2 state. This info is
23854671Smjacobpassed via the Guest State Buffer (GSB), necessary details below:
23954671Smjacob
24054671SmjacobThis can set either L2 wide or vcpu specific information. Examples of
24154671SmjacobL2 wide is the timebase offset or process scoped page table
24254671Smjacobinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
24354671Smjacobparameter specifies if this call is L2 wide or vCPU specific and the
24454671SmjacobIDs in the GSB must match this.
24554671Smjacob
24654671SmjacobThe L1 provides a pointer to the GSB as a parameter to this call. Also
24754671Smjacobprovided is the L2 and vCPU IDs associated with the state to set.
24835388Smjacob
24939235SgibbsThe L1 writes all values in the GSB and the L0 only reads the GSB for
25042462Smjacobthis call::
25142462Smjacob
25235388Smjacob  H_GUEST_SET_STATE(uint64 flags,
25339235Sgibbs                    uint64 guestId,
25442462Smjacob                    uint64 vcpuId,
25539235Sgibbs                    uint64 dataBuffer,
25639235Sgibbs                    uint64 dataBufferSizeInBytes);
25745040Smjacob  Parameters:
25835388Smjacob    Input:
25946971Smjacob      flags:
26035388Smjacob         Bit 0: getGuestWideState: Request state of the Guest instead
26135388Smjacob           of an individual VCPU.
26239235Sgibbs         Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See
26339235Sgibbs           GET_STATE takeOwnershipOfVcpuState
26439235Sgibbs         Bits 2-63: Reserved
26539235Sgibbs      guestId: ID obtained from H_GUEST_CREATE
26639235Sgibbs      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
26745040Smjacob      dataBuffer: A L1 real address of the GSB.
26839235Sgibbs        If takeOwnershipOfVcpuState, size must be at least the size
26946971Smjacob        returned by ID=0x0001
27035388Smjacob      dataBufferSizeInBytes: Size of dataBuffer
27135388Smjacob    Output:
27239235Sgibbs      R3: Return code
27339235Sgibbs      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
27446971Smjacob            element ID.
27539235Sgibbs          If R3 = H_Invalid_Element_Size: The array index of the bad
27639235Sgibbs             element size.
27735388Smjacob          If R3 = H_Invalid_Element_Value: The array index of the bad
27835388Smjacob             element value.
27946971Smjacob
28035388SmjacobH_GUEST_RUN_VCPU()
28135388Smjacob------------------
28235388Smjacob
28335388SmjacobThis is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
28446971Smjacobparameters. The vCPU runs with the state set previously using
28535388SmjacobH_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
28645287Smjacobhcall.
28754671Smjacob
28845287SmjacobThis hcall also has associated input and output GSBs. Unlike
28946971SmjacobH_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
29045287Smjacobparameters to the hcall (This was done in the interest of
29135388Smjacobperformance). The locations of these GSBs must be preregistered using
29239235Sgibbsthe H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
29343420Smjacobbelow).
29443420Smjacob
29539235SgibbsThe input GSB may contain only VCPU specific elements to be set. This
29643420SmjacobGSB may also contain zero elements (ie 0 in the first 4 bytes of the
29754057SmjacobGSB) if nothing needs to be set.
29843420Smjacob
29942462SmjacobOn exit from the hcall, the output buffer is filled with elements
30043420Smjacobdetermined by the L0. The reason for the exit is contained in GPR4 (ie
30139235SgibbsNIP is put in GPR4).  The elements returned depend on the exit
30243420Smjacobtype. For example, if the exit reason is the L2 doing a hcall (GPR4 =
30343420Smjacob0xc00), then GPR3-12 are provided in the output GSB as this is the
30443420Smjacobstate likely needed to service the hcall. If additional state is
30543420Smjacobneeded, H_GUEST_GET_STATE() may be called by the L1.
30643420Smjacob
30743420SmjacobTo synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
30843420Smjacobthe L1 may set a flag (as a hcall parameter) and the L0 will
30954057Smjacobsynthesize the interrupt in the L2. Alternatively, the L1 may
31043420Smjacobsynthesize the interrupt itself using H_GUEST_SET_STATE() or the
31144819SmjacobH_GUEST_RUN_VCPU() input GSB to set the state appropriately::
31244819Smjacob
31344819Smjacob  H_GUEST_RUN_VCPU(uint64 flags,
31444819Smjacob                   uint64 guestId,
31546971Smjacob                   uint64 vcpuId,
31645040Smjacob                   uint64 dataBuffer,
31743420Smjacob                   uint64 dataBufferSizeInBytes);
31844819Smjacob  Parameters:
31944819Smjacob    Input:
32044819Smjacob      flags:
32143420Smjacob         Bit 0: generateExternalInterrupt: Generate an external interrupt
32239235Sgibbs         Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell
32339235Sgibbs         Bit 2: sendToSystemReset���: Generate a System Reset Interrupt
32439235Sgibbs         Bits 3-63: Reserved
32539235Sgibbs      guestId: ID obtained from H_GUEST_CREATE
32639235Sgibbs      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
32739235Sgibbs    Output:
32846971Smjacob      R3: Return code
32946971Smjacob      R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA)
33039235Sgibbs            0x000: The VCPU stopped running for an unspecified reason. An
33139235Sgibbs              example of this is the Hypervisor stopping a VCPU running
33243420Smjacob              due to an outstanding interrupt for the Host Partition.
33335388Smjacob            0x980: HDEC
33435388Smjacob            0xC00: HCALL
33535388Smjacob            0xE00: HDSI
33635388Smjacob            0xE20: HISI
33735388Smjacob            0xE40: HEA
33835388Smjacob            0xF80: HV Fac Unavail
33935388Smjacob          If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or
34043420Smjacob            H_Invalid_Element_Value: R4 is offset of the invalid element
34139235Sgibbs            in the input buffer.
34235388Smjacob
34335388SmjacobH_GUEST_DELETE()
34435388Smjacob----------------
34535388Smjacob
34635388SmjacobThis is called to delete an L2. All associated vCPUs are also
34744819Smjacobdeleted. No specific vCPU delete call is provided.
34835388Smjacob
34935388SmjacobA flag may be provided to delete all guests. This is used to reset the
35035388SmjacobL0 in the case of kdump/kexec::
35135388Smjacob
35235388Smjacob  H_GUEST_DELETE(uint64 flags,
35335388Smjacob                 uint64 guestId)
35444819Smjacob  Parameters:
35544819Smjacob    Input:
35644819Smjacob      flags:
35744819Smjacob         Bit 0: deleteAllGuests: deletes all guests
35844819Smjacob         Bits 1-63: Reserved
35944819Smjacob      guestId: ID obtained from H_GUEST_CREATE
36044819Smjacob    Output:
36144819Smjacob      R3: Return code
36235388Smjacob
36335388SmjacobGuest State Buffer
36435388Smjacob==================
36535388Smjacob
36645040SmjacobThe Guest State Buffer (GSB) is the main method of communicating state
36735388Smjacobabout the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
36845040SmjacobH_GUEST_VCPU_RUN() calls.
36944819Smjacob
37044819SmjacobState may be associated with a whole L2 (eg timebase offset) or a
37135388Smjacobspecific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
37235388SmjacobH_GUEST_VCPU_RUN().
37335388Smjacob
37435388SmjacobAll data in the GSB is big endian (as is standard in PAPR)
37535388Smjacob
37635388SmjacobThe Guest state buffer has a header which gives the number of
37743420Smjacobelements, followed by the GSB elements themselves.
37843420Smjacob
37943420SmjacobGSB header:
38043420Smjacob
38135388Smjacob+----------+----------+-------------------------------------------+
38235388Smjacob|  Offset  |  Size    |  Purpose                                  |
38335388Smjacob|  Bytes   |  Bytes   |                                           |
38435388Smjacob+==========+==========+===========================================+
38535388Smjacob|    0     |    4     |  Number of elements                       |
38635388Smjacob+----------+----------+-------------------------------------------+
38735388Smjacob|    4     |          |  Guest state buffer elements              |
38835388Smjacob+----------+----------+-------------------------------------------+
38935388Smjacob
39035388SmjacobGSB element:
39135388Smjacob
39235388Smjacob+----------+----------+-------------------------------------------+
39335388Smjacob|  Offset  |  Size    |  Purpose                                  |
39449907Smjacob|  Bytes   |  Bytes   |                                           |
39535388Smjacob+==========+==========+===========================================+
39635388Smjacob|    0     |    2     |  ID                                       |
39735388Smjacob+----------+----------+-------------------------------------------+
39835388Smjacob|    2     |    2     |  Size of Value                            |
39935388Smjacob+----------+----------+-------------------------------------------+
40035388Smjacob|    4     | As above |  Value                                    |
40135388Smjacob+----------+----------+-------------------------------------------+
40235388Smjacob
40335388SmjacobThe ID in the GSB element specifies what is to be set. This includes
40435388Smjacobarchtected state like GPRs, VSRs, SPRs, plus also some meta data about
40535388Smjacobthe partition like the timebase offset and partition scoped page
40635388Smjacobtable information.
40743420Smjacob
40835388Smjacob+--------+-------+----+--------+----------------------------------+
40943420Smjacob|   ID   | Size  | RW | Thread | Details                          |
41043420Smjacob|        | Bytes |    | Guest  |                                  |
41135388Smjacob|        |       |    | Scope  |                                  |
41243420Smjacob+========+=======+====+========+==================================+
41344819Smjacob| 0x0000 |       | RW |   TG   | NOP element                      |
41435388Smjacob+--------+-------+----+--------+----------------------------------+
41535388Smjacob| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state. See:      |
41635388Smjacob|        |       |    |        | H_GUEST_GET_STATE:               |
41735388Smjacob|        |       |    |        | flags = takeOwnershipOfVcpuState |
41835388Smjacob+--------+-------+----+--------+----------------------------------+
41943420Smjacob| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
42043420Smjacob+--------+-------+----+--------+----------------------------------+
42143420Smjacob| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
42235388Smjacob+--------+-------+----+--------+----------------------------------+
42335388Smjacob| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
42435388Smjacob+--------+-------+----+--------+----------------------------------+
42542462Smjacob| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
42643420Smjacob|        |       |    |        |                                  |
42742462Smjacob|        |       |    |        |- 0x00 Addr part scope table      |
42842462Smjacob|        |       |    |        |- 0x08 Num addr bits              |
42942462Smjacob|        |       |    |        |- 0x10 Size root dir              |
43044819Smjacob+--------+-------+----+--------+----------------------------------+
43143420Smjacob| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
43242462Smjacob|        |       |    |        |                                  |
43342462Smjacob|        |       |    |        |- 0x0 Addr proc scope table       |
43442462Smjacob|        |       |    |        |- 0x8 Table size.                 |
43542462Smjacob+--------+-------+----+--------+----------------------------------+
43643420Smjacob| 0x0007-|       |    |        | Reserved                         |
43742462Smjacob| 0x0BFF |       |    |        |                                  |
43843420Smjacob+--------+-------+----+--------+----------------------------------+
43943420Smjacob| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
44035388Smjacob|        |       |    |        |                                  |
44135388Smjacob|        |       |    |        |- 0x0 Addr of buffer              |
44235388Smjacob|        |       |    |        |- 0x8 Buffer Size.                |
44343420Smjacob+--------+-------+----+--------+----------------------------------+
44443420Smjacob| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
44545040Smjacob|        |       |    |        |                                  |
44643420Smjacob|        |       |    |        |- 0x0 Addr of buffer              |
44743420Smjacob|        |       |    |        |- 0x8 Buffer Size.                |
44843420Smjacob+--------+-------+----+--------+----------------------------------+
44945040Smjacob| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
45043420Smjacob+--------+-------+----+--------+----------------------------------+
45143420Smjacob| 0x0C03-|       |    |        | Reserved                         |
45243420Smjacob| 0x0FFF |       |    |        |                                  |
45343420Smjacob+--------+-------+----+--------+----------------------------------+
45443420Smjacob| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
45543420Smjacob| 0x101F |       |    |        |                                  |
45643420Smjacob+--------+-------+----+--------+----------------------------------+
45743420Smjacob| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
45843420Smjacob+--------+-------+----+--------+----------------------------------+
45943420Smjacob| 0x1021 | 0x08  | RW |   T    | NIA                              |
46043420Smjacob+--------+-------+----+--------+----------------------------------+
46135388Smjacob| 0x1022 | 0x08  | RW |   T    | MSR                              |
46243420Smjacob+--------+-------+----+--------+----------------------------------+
46335388Smjacob| 0x1023 | 0x08  | RW |   T    | LR                               |
46435388Smjacob+--------+-------+----+--------+----------------------------------+
46535388Smjacob| 0x1024 | 0x08  | RW |   T    | XER                              |
46635388Smjacob+--------+-------+----+--------+----------------------------------+
46735388Smjacob| 0x1025 | 0x08  | RW |   T    | CTR                              |
46835388Smjacob+--------+-------+----+--------+----------------------------------+
46935388Smjacob| 0x1026 | 0x08  | RW |   T    | CFAR                             |
47035388Smjacob+--------+-------+----+--------+----------------------------------+
47143420Smjacob| 0x1027 | 0x08  | RW |   T    | SRR0                             |
47243420Smjacob+--------+-------+----+--------+----------------------------------+
47343420Smjacob| 0x1028 | 0x08  | RW |   T    | SRR1                             |
47443420Smjacob+--------+-------+----+--------+----------------------------------+
47543420Smjacob| 0x1029 | 0x08  | RW |   T    | DAR                              |
47643420Smjacob+--------+-------+----+--------+----------------------------------+
47743420Smjacob| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
47843420Smjacob+--------+-------+----+--------+----------------------------------+
47943420Smjacob| 0x102B | 0x08  | RW |   T    | VTB                              |
48043420Smjacob+--------+-------+----+--------+----------------------------------+
48143420Smjacob| 0x102C | 0x08  | RW |   T    | LPCR                             |
48243420Smjacob+--------+-------+----+--------+----------------------------------+
48343420Smjacob| 0x102D | 0x08  | RW |   T    | HFSCR                            |
48443420Smjacob+--------+-------+----+--------+----------------------------------+
48543420Smjacob| 0x102E | 0x08  | RW |   T    | FSCR                             |
48643420Smjacob+--------+-------+----+--------+----------------------------------+
48743420Smjacob| 0x102F | 0x08  | RW |   T    | FPSCR                            |
48843420Smjacob+--------+-------+----+--------+----------------------------------+
48943420Smjacob| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
49035388Smjacob+--------+-------+----+--------+----------------------------------+
49135388Smjacob| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
49235388Smjacob+--------+-------+----+--------+----------------------------------+
49335388Smjacob| 0x1032 | 0x08  | RW |   T    | CIABR                            |
49435388Smjacob+--------+-------+----+--------+----------------------------------+
49535388Smjacob| 0x1033 | 0x08  | RW |   T    | PURR                             |
49635388Smjacob+--------+-------+----+--------+----------------------------------+
49735388Smjacob| 0x1034 | 0x08  | RW |   T    | SPURR                            |
49835388Smjacob+--------+-------+----+--------+----------------------------------+
49948486Smjacob| 0x1035 | 0x08  | RW |   T    | IC                               |
50035388Smjacob+--------+-------+----+--------+----------------------------------+
50135388Smjacob| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
50235388Smjacob| 0x1039 |       |    |        |                                  |
50335388Smjacob+--------+-------+----+--------+----------------------------------+
50435388Smjacob| 0x103A | 0x08  | W  |   T    | PPR                              |
50535388Smjacob+--------+-------+----+--------+----------------------------------+
50635388Smjacob| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
50735388Smjacob| 0x103E |       |    |        |                                  |
50835388Smjacob+--------+-------+----+--------+----------------------------------+
50935388Smjacob| 0x103F | 0x08  | RW |   T    | MMCRA                            |
51035388Smjacob+--------+-------+----+--------+----------------------------------+
51135388Smjacob| 0x1040 | 0x08  | RW |   T    | SIER                             |
51235388Smjacob+--------+-------+----+--------+----------------------------------+
51335388Smjacob| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
51435388Smjacob+--------+-------+----+--------+----------------------------------+
51535388Smjacob| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
51635388Smjacob+--------+-------+----+--------+----------------------------------+
51735388Smjacob| 0x1043 | 0x08  | RW |   T    | BESCR                            |
51835388Smjacob+--------+-------+----+--------+----------------------------------+
51935388Smjacob| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
52035388Smjacob+--------+-------+----+--------+----------------------------------+
52135388Smjacob| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
52235388Smjacob+--------+-------+----+--------+----------------------------------+
52335388Smjacob| 0x1046 | 0x08  | RW |   T    | AMR                              |
52435388Smjacob+--------+-------+----+--------+----------------------------------+
52535388Smjacob| 0x1047 | 0x08  | RW |   T    | IAMR                             |
52635388Smjacob+--------+-------+----+--------+----------------------------------+
52735388Smjacob| 0x1048 | 0x08  | RW |   T    | AMOR                             |
52835388Smjacob+--------+-------+----+--------+----------------------------------+
52935388Smjacob| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
53052733Smjacob+--------+-------+----+--------+----------------------------------+
53135388Smjacob| 0x104A | 0x08  | RW |   T    | SDAR                             |
53235388Smjacob+--------+-------+----+--------+----------------------------------+
53335388Smjacob| 0x104B | 0x08  | RW |   T    | SIAR                             |
53435388Smjacob+--------+-------+----+--------+----------------------------------+
53552733Smjacob| 0x104C | 0x08  | RW |   T    | DSCR                             |
53652682Smjacob+--------+-------+----+--------+----------------------------------+
53752682Smjacob| 0x104D | 0x08  | RW |   T    | TAR                              |
53852682Smjacob+--------+-------+----+--------+----------------------------------+
53952682Smjacob| 0x104E | 0x08  | RW |   T    | DEXCR                            |
54035388Smjacob+--------+-------+----+--------+----------------------------------+
54135388Smjacob| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
54235388Smjacob+--------+-------+----+--------+----------------------------------+
54335388Smjacob| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
54435388Smjacob+--------+-------+----+--------+----------------------------------+
54543420Smjacob| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
54643420Smjacob+--------+-------+----+--------+----------------------------------+
54743420Smjacob| 0x1052 | 0x08  | RW |   T    | CTRL                             |
54843420Smjacob+--------+-------+----+--------+----------------------------------+
54935388Smjacob| 0x1053-|       |    |        | Reserved                         |
55035388Smjacob| 0x1FFF |       |    |        |                                  |
55135388Smjacob+--------+-------+----+--------+----------------------------------+
55243420Smjacob| 0x2000 | 0x04  | RW |   T    | CR                               |
55343420Smjacob+--------+-------+----+--------+----------------------------------+
55443420Smjacob| 0x2001 | 0x04  | RW |   T    | PIDR                             |
55543420Smjacob+--------+-------+----+--------+----------------------------------+
55643420Smjacob| 0x2002 | 0x04  | RW |   T    | DSISR                            |
55743420Smjacob+--------+-------+----+--------+----------------------------------+
55843420Smjacob| 0x2003 | 0x04  | RW |   T    | VSCR                             |
55943420Smjacob+--------+-------+----+--------+----------------------------------+
56043420Smjacob| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
56135388Smjacob+--------+-------+----+--------+----------------------------------+
56235388Smjacob| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
56335388Smjacob+--------+-------+----+--------+----------------------------------+
56435388Smjacob| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
56535388Smjacob+--------+-------+----+--------+----------------------------------+
56635388Smjacob| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
56735388Smjacob| 0x200c |       |    |        |                                  |
56835388Smjacob+--------+-------+----+--------+----------------------------------+
56935388Smjacob| 0x200D | 0x04  | RW |   T    | WORT                             |
57035388Smjacob+--------+-------+----+--------+----------------------------------+
57135388Smjacob| 0x200E | 0x04  | RW |   T    | PSPB                             |
57235388Smjacob+--------+-------+----+--------+----------------------------------+
57335388Smjacob| 0x200F-|       |    |        | Reserved                         |
57443420Smjacob| 0x2FFF |       |    |        |                                  |
57543420Smjacob+--------+-------+----+--------+----------------------------------+
57643420Smjacob| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
57743420Smjacob| 0x303F |       |    |        |                                  |
57835388Smjacob+--------+-------+----+--------+----------------------------------+
57935388Smjacob| 0x3040-|       |    |        | Reserved                         |
58049907Smjacob| 0xEFFF |       |    |        |                                  |
58135388Smjacob+--------+-------+----+--------+----------------------------------+
58239235Sgibbs| 0xF000 | 0x08  | R  |   T    | HDAR                             |
58335388Smjacob+--------+-------+----+--------+----------------------------------+
58446971Smjacob| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
58535388Smjacob+--------+-------+----+--------+----------------------------------+
58646971Smjacob| 0xF002 | 0x04  | R  |   T    | HEIR                             |
58735388Smjacob+--------+-------+----+--------+----------------------------------+
58835388Smjacob| 0xF003 | 0x08  | R  |   T    | ASDR                             |
58952350Smjacob+--------+-------+----+--------+----------------------------------+
59052350Smjacob
59139235Sgibbs
59235388SmjacobMiscellaneous info
59335388Smjacob==================
59435388Smjacob
59535388SmjacobState not in ptregs/hvregs
59635388Smjacob--------------------------
59735388Smjacob
59852350SmjacobIn the v1 API, some state is not in the ptregs/hvstate. This includes
59935388Smjacobthe vector register and some SPRs. For the L1 to set this state for
60035388Smjacobthe L2, the L1 loads up these hardware registers before the
60152350Smjacobh_enter_nested() call and the L0 ensures they end up as the L2 state
60252350Smjacob(by not touching them).
60352350Smjacob
60445040SmjacobThe v2 API removes this and explicitly sets this state via the GSB.
60543789Smjacob
60652350SmjacobL1 Implementation details: Caching state
60743789Smjacob----------------------------------------
60843789Smjacob
60952350SmjacobIn the v1 API, all state is sent from the L1 to the L0 and vice versa
61045287Smjacobon every h_enter_nested() hcall. If the L0 is not currently running
61145287Smjacobany L2s, the L0 has no state information about them. The only
61245287Smjacobexception to this is the location of the partition table, registered
61345287Smjacobvia h_set_partition_table().
61445287Smjacob
61552350SmjacobThe v2 API changes this so that the L0 retains the L2 state even when
61645287Smjacobit's vCPUs are no longer running. This means that the L1 only needs to
61745287Smjacobcommunicate with the L0 about L2 state when it needs to modify the L2
61839235Sgibbsstate, or when it's value is out of date. This provides an opportunity
61952350Smjacobfor performance optimisation.
62052350Smjacob
62152350SmjacobWhen a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
62252350Smjacobmarks all L2 state as invalid. This means that if the L1 wants to know
62352350Smjacobthe L2 state (say via a kvm_get_one_reg() call), it needs call
62452350SmjacobH_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
62552350Smjacobvalid in L1 until the L2 is run again.
62652350Smjacob
62752350SmjacobAlso, when an L1 modifies L2 vcpu state, it doesn't need to write it
62852350Smjacobto the L0 until that L2 vcpu runs again. Hence when the L1 updates
62935388Smjacobstate (say via a kvm_set_one_reg() call), it writes to an internal L1
63052350Smjacobcopy and only flushes this copy to the L0 when the L2 runs again via
63146971Smjacobthe H_GUEST_VCPU_RUN() input buffer.
63246971Smjacob
63346971SmjacobThis lazy updating of state by the L1 avoids unnecessary
63446971SmjacobH_GUEST_{G|S}ET_STATE() calls.
63546971Smjacob