History log of /netbsd-current/sys/arch/xen/x86/x86_xpmap.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.93 11-May-2024 andvar

s/boostrap/bootstrap/ in comment, warning message and documentation.


Revision tags: netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base
# 1.92 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.91 11-May-2022 bouyer

In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.90 06-Sep-2020 riastradh

Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.


# 1.89 26-May-2020 bouyer

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

branches: 1.84.4;
Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-3-RELEASE netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.92 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.91 11-May-2022 bouyer

In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.90 06-Sep-2020 riastradh

Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.


# 1.89 26-May-2020 bouyer

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

branches: 1.84.4;
Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.91 11-May-2022 bouyer

In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.90 06-Sep-2020 riastradh

Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.


# 1.89 26-May-2020 bouyer

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

branches: 1.84.4;
Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.90 06-Sep-2020 riastradh

Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.


# 1.89 26-May-2020 bouyer

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

branches: 1.84.4;
Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.89 26-May-2020 bouyer

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.88 06-May-2020 bouyer

xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.


# 1.87 06-May-2020 bouyer

KASSERT() that the per-cpu queues are run at IPL_VM after boot.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.86 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.85 30-Oct-2019 maxv

Switch to new PTE bits.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.84 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


Revision tags: isaki-audio2-base
# 1.84 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.83 07-Mar-2019 maxv

Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.


# 1.82 04-Feb-2019 cherry

Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.81 29-Jul-2018 maxv

Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.


Revision tags: pgoyette-compat-0728
# 1.80 27-Jul-2018 maxv

Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.


# 1.79 26-Jul-2018 maxv

Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.


# 1.78 26-Jul-2018 maxv

Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.


# 1.77 26-Jul-2018 maxv

Merge the blocks. No functional change.


# 1.76 26-Jul-2018 maxv

Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.75 24-Jun-2018 jdolecek

mark with XXXSMP all remaining spl*() and tsleep() calls


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.74 16-Sep-2017 maxv

branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.74 16-Sep-2017 maxv

Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.73 18-Mar-2017 maxv

Style, and remove debug code that does not work anyway.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.72 08-Mar-2017 maxv

A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


Revision tags: nick-nhusb-base-20170204
# 1.71 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.70 22-Jan-2017 maxv

Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


Revision tags: pgoyette-localcount-20170107
# 1.69 06-Jan-2017 maxv

Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.


# 1.68 16-Dec-2016 maxv

The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.


Revision tags: nick-nhusb-base-20161204
# 1.67 15-Nov-2016 maxv

Mmh, apparently I didn't properly test my previous change since it does not
compile anymore


# 1.66 15-Nov-2016 maxv

Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.


# 1.65 11-Nov-2016 maxv

Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.


# 1.64 11-Nov-2016 maxv

Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.


Revision tags: pgoyette-localcount-20161104
# 1.63 01-Nov-2016 maxv

Map the PTE space as non-executable on PAE. The same is already done on
amd64.


# 1.62 01-Nov-2016 maxv

Map the remaining pages as non-executable. Only text should have X.


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.61 25-Aug-2016 bouyer

Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.


# 1.60 23-Aug-2016 bouyer

Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2


# 1.59 11-Aug-2016 maxv

Make the I/O area non-executable on Xen.


Revision tags: pgoyette-localcount-20160806
# 1.58 03-Aug-2016 maxv

Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.


# 1.57 02-Aug-2016 maxv

Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.


# 1.56 02-Aug-2016 maxv

Use PG_RO instead of a magic zero.


# 1.55 02-Aug-2016 maxv

KNF, and use PAGE_SIZE instead of NBPG.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.54 29-May-2016 bouyer

branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.53 06-May-2014 cherry

branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.52 10-Nov-2013 jnemeth

branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.


# 1.51 08-Nov-2013 christos

fix unused variable warnings


# 1.50 06-Nov-2013 mrg

- move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.49 16-Sep-2012 rmind

branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.


# 1.48 21-Aug-2012 bouyer

branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.


# 1.47 21-Aug-2012 rmind

Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).


# 1.46 30-Jun-2012 jym

Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.


# 1.45 27-Jun-2012 jym

Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).


# 1.44 06-Jun-2012 rmind

Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.43 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.42 02-Mar-2012 bouyer

MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.


Revision tags: jmcneill-usbmp-base5
# 1.41 24-Feb-2012 cherry

(xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).


Revision tags: jmcneill-usbmp-base3
# 1.40 23-Feb-2012 bouyer

On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699


Revision tags: jmcneill-usbmp-base2
# 1.39 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: netbsd-6-base
# 1.38 12-Jan-2012 cherry

branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath


# 1.37 09-Jan-2012 cherry

Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.36 06-Nov-2011 cherry

branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.


# 1.35 06-Nov-2011 cherry

[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.


Revision tags: yamt-pagecache-base
# 1.34 20-Sep-2011 jym

branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.33 21-Aug-2011 jym

Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!


# 1.32 13-Aug-2011 cherry

Call the right function
(fix for an egregious error)


# 1.31 13-Aug-2011 cherry

Add locking around ops to the hypervisor MMU "queue".


# 1.30 13-Aug-2011 cherry

remove unnecessary locking overhead for UP


# 1.29 10-Aug-2011 cherry

Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops


# 1.28 15-Jun-2011 rmind

Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.


# 1.27 15-Jun-2011 rmind

- cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.26 08-May-2011 jym

branches: 1.26.2;
Print the PGD address in the debug message.


# 1.25 29-Mar-2011 jym

Typo fix.


Revision tags: uebayasi-xip-base7 bouyer-quota2-nbase bouyer-quota2-base
# 1.24 10-Feb-2011 jym

Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.23 20-Dec-2010 jym

branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...


# 1.22 19-Dec-2010 jym

Need the successful count (for AMI debugging)


Revision tags: uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.21 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.20 15-Jul-2010 jym

With Xen, PDPpaddr should contain a guest physical address (== PFN).


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.19 26-Feb-2010 jym

branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html


# 1.18 12-Feb-2010 jym

Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.17 23-Oct-2009 snj

branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).


# 1.16 19-Oct-2009 bouyer

Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7
# 1.15 29-Jul-2009 cegger

remove Xen2 support.
ok bouyer@


Revision tags: jymxensuspend-base
# 1.14 23-Jul-2009 jym

Fix typos in comments and __PRINTKs.


Revision tags: yamt-nfs-mp-base6
# 1.13 20-Jun-2009 cegger

sprintf -> snprintf. Wrap long lines.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base nick-hppapmap-base haad-dm-base mjf-devfs2-base
# 1.12 13-Nov-2008 cegger

branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.11 24-Oct-2008 jym

branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).


# 1.10 21-Oct-2008 cegger

introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.


Revision tags: haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.9 05-Sep-2008 tron

Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base
# 1.8 14-Apr-2008 cegger

branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.7 17-Feb-2008 bouyer

branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.


# 1.6 23-Jan-2008 bouyer

Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.5 15-Jan-2008 bouyer

Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.


# 1.4 11-Jan-2008 bouyer

Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.


Revision tags: matt-armv6-base vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base vmlocking2-base1 vmlocking-nbase jmcneill-pm-base
# 1.3 23-Nov-2007 bouyer

branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.


# 1.2 22-Nov-2007 bouyer

Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.


Revision tags: bouyer-xenamd64-base jmcneill-base bouyer-xenamd64-base2
# 1.1 21-Oct-2007 bouyer

branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.