Cross Reference: /netbsd-current/sys/arch/x86/x86/fpu.c

History log of /netbsd-current/sys/arch/x86/x86/fpu.c
Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 1.89	21-Jun-2024	riastradh	x86/fpu.c: Nix trailing whitespace. No functional change intended.
# 1.88	17-May-2024	manu	iWorkaround panic: fpudna from userland i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU contact switching code took care of ignoring them. When transitioning from eager switching to awlays switching, this special handling was removed, causing "fpudna from userland" panics. This change restores the previosu behavior where fpudna traps from userland are ignored on Xen PV domU.
Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.87	18-Jul-2023	riastradh	x86/fpu: In kernel mode fpu traps, print the instruction pointer.
# 1.86	03-Mar-2023	riastradh	x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask. 16 bytes is not enough. (Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?) XXX pullup-10
# 1.85	03-Mar-2023	riastradh	Revert "x86: Add kthread_fpu_enter/exit support, take two." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.84	03-Mar-2023	riastradh	Revert "x86/fpu.c: Sprinkle KNF." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.83	25-Feb-2023	riastradh	x86/fpu.c: Sprinkle KNF. No functional change intended.
# 1.82	25-Feb-2023	riastradh	x86: Add kthread_fpu_enter/exit support, take two. This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit. This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise. XXX pullup-10
# 1.81	25-Feb-2023	riastradh	x86: Label boolean is_64bit argument to fpu_area_restore. No functional change intended.
# 1.80	25-Feb-2023	riastradh	x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.79	20-Aug-2022	riastradh	branches: 1.79.4; fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.88	17-May-2024	manu	iWorkaround panic: fpudna from userland i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU contact switching code took care of ignoring them. When transitioning from eager switching to awlays switching, this special handling was removed, causing "fpudna from userland" panics. This change restores the previosu behavior where fpudna traps from userland are ignored on Xen PV domU.
Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.87	18-Jul-2023	riastradh	x86/fpu: In kernel mode fpu traps, print the instruction pointer.
# 1.86	03-Mar-2023	riastradh	x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask. 16 bytes is not enough. (Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?) XXX pullup-10
# 1.85	03-Mar-2023	riastradh	Revert "x86: Add kthread_fpu_enter/exit support, take two." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.84	03-Mar-2023	riastradh	Revert "x86/fpu.c: Sprinkle KNF." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.83	25-Feb-2023	riastradh	x86/fpu.c: Sprinkle KNF. No functional change intended.
# 1.82	25-Feb-2023	riastradh	x86: Add kthread_fpu_enter/exit support, take two. This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit. This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise. XXX pullup-10
# 1.81	25-Feb-2023	riastradh	x86: Label boolean is_64bit argument to fpu_area_restore. No functional change intended.
# 1.80	25-Feb-2023	riastradh	x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.79	20-Aug-2022	riastradh	branches: 1.79.4; fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.87	18-Jul-2023	riastradh	x86/fpu: In kernel mode fpu traps, print the instruction pointer.
# 1.86	03-Mar-2023	riastradh	x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask. 16 bytes is not enough. (Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?) XXX pullup-10
# 1.85	03-Mar-2023	riastradh	Revert "x86: Add kthread_fpu_enter/exit support, take two." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.84	03-Mar-2023	riastradh	Revert "x86/fpu.c: Sprinkle KNF." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.83	25-Feb-2023	riastradh	x86/fpu.c: Sprinkle KNF. No functional change intended.
# 1.82	25-Feb-2023	riastradh	x86: Add kthread_fpu_enter/exit support, take two. This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit. This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise. XXX pullup-10
# 1.81	25-Feb-2023	riastradh	x86: Label boolean is_64bit argument to fpu_area_restore. No functional change intended.
# 1.80	25-Feb-2023	riastradh	x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.79	20-Aug-2022	riastradh	fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.86	03-Mar-2023	riastradh	x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask. 16 bytes is not enough. (Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?) XXX pullup-10
# 1.85	03-Mar-2023	riastradh	Revert "x86: Add kthread_fpu_enter/exit support, take two." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.84	03-Mar-2023	riastradh	Revert "x86/fpu.c: Sprinkle KNF." kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
# 1.83	25-Feb-2023	riastradh	x86/fpu.c: Sprinkle KNF. No functional change intended.
# 1.82	25-Feb-2023	riastradh	x86: Add kthread_fpu_enter/exit support, take two. This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit. This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise. XXX pullup-10
# 1.81	25-Feb-2023	riastradh	x86: Label boolean is_64bit argument to fpu_area_restore. No functional change intended.
# 1.80	25-Feb-2023	riastradh	x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.79	20-Aug-2022	riastradh	fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.83	25-Feb-2023	riastradh	x86/fpu.c: Sprinkle KNF. No functional change intended.
# 1.82	25-Feb-2023	riastradh	x86: Add kthread_fpu_enter/exit support, take two. This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit. This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise. XXX pullup-10
# 1.81	25-Feb-2023	riastradh	x86: Label boolean is_64bit argument to fpu_area_restore. No functional change intended.
# 1.80	25-Feb-2023	riastradh	x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.79	20-Aug-2022	riastradh	fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.79	20-Aug-2022	riastradh	fpu_kern_enter/leave: Disable IPL assertions. These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example, mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM); will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.78	24-May-2022	andvar	fix various typos in comments, docs and log messages.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.77	01-Apr-2022	riastradh	x86, arm: Allow fpu_kern_enter/leave while cold. Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point. Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited. Also initialize cpu0's ci_kfpu_spl.
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.76	24-Oct-2020	mgorny	Issue 64-bit versions of XSAVE for 64-bit amd64 programs When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area. The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs. The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code. This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
# 1.75	15-Oct-2020	mgorny	Revert "Merge convert_xmm_s87.c into fpu.c" I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.74	02-Aug-2020	riastradh	Revert "Add kthread_fpu_enter/exit support to x86." for now. Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.73	01-Aug-2020	riastradh	Add kthread_fpu_enter/exit support to x86.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.72	20-Jul-2020	riastradh	Fix fpu_kern_enter in a softint that interrupted a softint. We need to find the lwp that was originally interrupted to save its fpu state. With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
# 1.71	20-Jul-2020	riastradh	Save fpu state at IPL_VM to exclude fpu_kern_enter/leave. This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not. (I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
# 1.70	20-Jul-2020	riastradh	Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint." This only fixed part of the race, and we can do it more simply.
# 1.69	20-Jul-2020	riastradh	Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled." This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.68	13-Jul-2020	riastradh	Limit x86 fpu_kern_enter/leave to IPL_VM or below. There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads. This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation). KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.67	06-Jul-2020	riastradh	Restore the lwp's fpu state, not zeros, and leave with fpu enabled. We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics. In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse. May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.66	06-Jul-2020	riastradh	Fix race in fpu save with fpu_kern_enter in softint. Likely source of: https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.65	14-Jun-2020	riastradh	Use static constant rather than stack memset buffer for zero fpregs.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.64	13-Jun-2020	riastradh	Add comments over fpu_kern_enter/leave.
# 1.63	13-Jun-2020	riastradh	Zero the fpu registers on fpu_kern_leave. Avoid Spectre-class attacks on any values left in them.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.62	04-Jun-2020	riastradh	Call clts/stts in fpu_kern_enter/leave so they work.
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	branches: 1.60.2; Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.61	31-Jan-2020	maxv	'oldlwp' is never NULL now, so remove the NULL checks.
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.60	27-Nov-2019	maxv	Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-0-RC1 netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.60	27-Nov-2019	maxv	Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
Revision tags: phil-wifi-20191119
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.59	30-Oct-2019	maxv	Style.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.58	12-Oct-2019	maxv	Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.57	04-Oct-2019	maxv	Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.56	03-Oct-2019	maxv	Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
Revision tags: netbsd-9-base
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.55	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.54	26-Jun-2019	mgorny	Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
Revision tags: phil-wifi-20190609
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.53	25-May-2019	maxv	Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.52	19-May-2019	maxv	Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change.
# 1.51	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
Revision tags: isaki-audio2-base
# 1.50	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
Revision tags: pgoyette-compat-20190127
# 1.49	20-Jan-2019	maxv	Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions.
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.48	05-Oct-2018	maxv	export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
Revision tags: pgoyette-compat-0930
# 1.47	17-Sep-2018	maxv	Reduce the noise, reorder and rename some things for clarity.
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
# 1.46	01-Jul-2018	maxv	Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
# 1.45	01-Jul-2018	maxv	Use a switch, we can (and will) optimize each case separately. No functional change.
# 1.44	29-Jun-2018	maxv	Add more KASSERTs. Should help PR/53399.
Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.43	23-Jun-2018	maxv	Add XXX in fpuinit_mxcsr_mask.
# 1.42	22-Jun-2018	maxv	Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
# 1.41	20-Jun-2018	jdolecek	as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case pointed out by maxv
# 1.40	19-Jun-2018	jdolecek	fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3 fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address XXX pullup netbsd-8
# 1.39	19-Jun-2018	maxv	When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs.
# 1.38	18-Jun-2018	maxv	Add more KASSERTs, see if they help PR/53383.
# 1.37	17-Jun-2018	maxv	No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
# 1.36	16-Jun-2018	maxv	Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs.
# 1.35	16-Jun-2018	maxv	Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
# 1.34	14-Jun-2018	maxv	Install the FPU state on the current CPU in setregs (execve).
# 1.33	14-Jun-2018	maxv	Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
# 1.32	23-May-2018	maxv	Add a comment about recent AMD CPUs.
# 1.31	23-May-2018	maxv	Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
# 1.30	23-May-2018	maxv	Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
# 1.29	23-May-2018	maxv	style
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.28	09-Feb-2018	maxv	branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.28	09-Feb-2018	maxv	Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
Revision tags: tls-maxphys-base-20171202
# 1.27	11-Nov-2017	maxv	Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
# 1.26	11-Nov-2017	bouyer	Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.25	08-Nov-2017	maxv	Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
# 1.24	04-Nov-2017	maxv	Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
# 1.23	04-Nov-2017	maxv	Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
# 1.22	04-Nov-2017	maxv	Fix xen. Not tested, but seems fine enough.
# 1.21	03-Nov-2017	maxv	Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
# 1.20	31-Oct-2017	maxv	Zero out the buffer entirely.
# 1.19	31-Oct-2017	maxv	Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
# 1.18	31-Oct-2017	maxv	Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
# 1.17	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
# 1.16	31-Oct-2017	maxv	Always use x86_fpu_save, clearer.
# 1.15	31-Oct-2017	maxv	Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
# 1.14	09-Oct-2017	maya	GC i386_fpu_present. no FPU x86 is not supported. Also delete newly unused send_sigill
# 1.13	17-Sep-2017	maxv	Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
# 1.12	29-Sep-2016	maxv	Remove outdated comments, typos, rename and reorder a few things.
Revision tags: localcount-20160914
# 1.11	18-Aug-2016	maxv	Simplify.
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.10	27-Nov-2014	uebayasi	branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
# 1.8	23-Feb-2014	dsl	Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
# 1.7	23-Feb-2014	dsl	Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
# 1.6	15-Feb-2014	dsl	Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
# 1.5	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
# 1.4	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
# 1.3	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
# 1.2	12-Feb-2014	dsl	Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
# 1.1	11-Feb-2014	dsl	Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.