#
1.89 |
|
21-Jun-2024 |
riastradh |
x86/fpu.c: Nix trailing whitespace.
No functional change intended.
|
#
1.88 |
|
17-May-2024 |
manu |
iWorkaround panic: fpudna from userland
i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU contact switching code took care of ignoring them. When transitioning from eager switching to awlays switching, this special handling was removed, causing "fpudna from userland" panics.
This change restores the previosu behavior where fpudna traps from userland are ignored on Xen PV domU.
|
Revision tags: thorpej-ifq-base thorpej-altq-separation-base
|
#
1.87 |
|
18-Jul-2023 |
riastradh |
x86/fpu: In kernel mode fpu traps, print the instruction pointer.
|
#
1.86 |
|
03-Mar-2023 |
riastradh |
x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?)
XXX pullup-10
|
#
1.85 |
|
03-Mar-2023 |
riastradh |
Revert "x86: Add kthread_fpu_enter/exit support, take two."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.84 |
|
03-Mar-2023 |
riastradh |
Revert "x86/fpu.c: Sprinkle KNF."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.83 |
|
25-Feb-2023 |
riastradh |
x86/fpu.c: Sprinkle KNF.
No functional change intended.
|
#
1.82 |
|
25-Feb-2023 |
riastradh |
x86: Add kthread_fpu_enter/exit support, take two.
This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise.
XXX pullup-10
|
#
1.81 |
|
25-Feb-2023 |
riastradh |
x86: Label boolean is_64bit argument to fpu_area_restore.
No functional change intended.
|
#
1.80 |
|
25-Feb-2023 |
riastradh |
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.
In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place.
For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
|
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
branches: 1.79.4; fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.88 |
|
17-May-2024 |
manu |
iWorkaround panic: fpudna from userland
i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU contact switching code took care of ignoring them. When transitioning from eager switching to awlays switching, this special handling was removed, causing "fpudna from userland" panics.
This change restores the previosu behavior where fpudna traps from userland are ignored on Xen PV domU.
|
Revision tags: thorpej-ifq-base thorpej-altq-separation-base
|
#
1.87 |
|
18-Jul-2023 |
riastradh |
x86/fpu: In kernel mode fpu traps, print the instruction pointer.
|
#
1.86 |
|
03-Mar-2023 |
riastradh |
x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?)
XXX pullup-10
|
#
1.85 |
|
03-Mar-2023 |
riastradh |
Revert "x86: Add kthread_fpu_enter/exit support, take two."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.84 |
|
03-Mar-2023 |
riastradh |
Revert "x86/fpu.c: Sprinkle KNF."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.83 |
|
25-Feb-2023 |
riastradh |
x86/fpu.c: Sprinkle KNF.
No functional change intended.
|
#
1.82 |
|
25-Feb-2023 |
riastradh |
x86: Add kthread_fpu_enter/exit support, take two.
This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise.
XXX pullup-10
|
#
1.81 |
|
25-Feb-2023 |
riastradh |
x86: Label boolean is_64bit argument to fpu_area_restore.
No functional change intended.
|
#
1.80 |
|
25-Feb-2023 |
riastradh |
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.
In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place.
For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
|
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
branches: 1.79.4; fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.87 |
|
18-Jul-2023 |
riastradh |
x86/fpu: In kernel mode fpu traps, print the instruction pointer.
|
#
1.86 |
|
03-Mar-2023 |
riastradh |
x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?)
XXX pullup-10
|
#
1.85 |
|
03-Mar-2023 |
riastradh |
Revert "x86: Add kthread_fpu_enter/exit support, take two."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.84 |
|
03-Mar-2023 |
riastradh |
Revert "x86/fpu.c: Sprinkle KNF."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.83 |
|
25-Feb-2023 |
riastradh |
x86/fpu.c: Sprinkle KNF.
No functional change intended.
|
#
1.82 |
|
25-Feb-2023 |
riastradh |
x86: Add kthread_fpu_enter/exit support, take two.
This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise.
XXX pullup-10
|
#
1.81 |
|
25-Feb-2023 |
riastradh |
x86: Label boolean is_64bit argument to fpu_area_restore.
No functional change intended.
|
#
1.80 |
|
25-Feb-2023 |
riastradh |
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.
In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place.
For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
|
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.86 |
|
03-Mar-2023 |
riastradh |
x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and accidentally had 64-byte alignment on native x86, but not in the call stack in Xen?)
XXX pullup-10
|
#
1.85 |
|
03-Mar-2023 |
riastradh |
Revert "x86: Add kthread_fpu_enter/exit support, take two."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.84 |
|
03-Mar-2023 |
riastradh |
Revert "x86/fpu.c: Sprinkle KNF."
kthread_fpu_enter/exit changes broke some hardware, unclear why, to investigate before fixing and reapplying these changes.
|
#
1.83 |
|
25-Feb-2023 |
riastradh |
x86/fpu.c: Sprinkle KNF.
No functional change intended.
|
#
1.82 |
|
25-Feb-2023 |
riastradh |
x86: Add kthread_fpu_enter/exit support, take two.
This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise.
XXX pullup-10
|
#
1.81 |
|
25-Feb-2023 |
riastradh |
x86: Label boolean is_64bit argument to fpu_area_restore.
No functional change intended.
|
#
1.80 |
|
25-Feb-2023 |
riastradh |
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.
In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place.
For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
|
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.83 |
|
25-Feb-2023 |
riastradh |
x86/fpu.c: Sprinkle KNF.
No functional change intended.
|
#
1.82 |
|
25-Feb-2023 |
riastradh |
x86: Add kthread_fpu_enter/exit support, take two.
This time, make sure to restore the FPU state when switching to a kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads that are not in kthread_fpu_enter/exit, so it incurs a penalty only for threads that actually use it. Since it avoids FPU state switching in kthreads that do use the FPU, namely cgd worker threads, this should be a net performance win on systems using it and have negligible impact otherwise.
XXX pullup-10
|
#
1.81 |
|
25-Feb-2023 |
riastradh |
x86: Label boolean is_64bit argument to fpu_area_restore.
No functional change intended.
|
#
1.80 |
|
25-Feb-2023 |
riastradh |
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.
In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place.
For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
|
Revision tags: netbsd-10-base bouyer-sunxi-drm-base
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.79 |
|
20-Aug-2022 |
riastradh |
fpu_kern_enter/leave: Disable IPL assertions.
These don't work because mutex_enter/exit on a spin lock may raise an IPL but not lower it, if another spin lock was already held. For example,
mutex_enter(some_lock_at_IPL_VM); printf("foo\n"); fpu_kern_enter(); ... fpu_kern_leave(); mutex_exit(some_lock_at_IPL_VM);
will trigger the panic, because printf takes a lock at IPL_HIGH where the IPL wil remain until the mutex_exit. (This was a nightmare to track down before I remembered that detail of spin lock IPL semantics...)
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.78 |
|
24-May-2022 |
andvar |
fix various typos in comments, docs and log messages.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.77 |
|
01-Apr-2022 |
riastradh |
x86, arm: Allow fpu_kern_enter/leave while cold.
Normally these are forbidden above IPL_VM, so that FPU usage doesn't block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during builtin module initialization at boot, all interrupts are blocked anyway so it's a moot point.
Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't trip over an assertion about it while cold -- the assertion is meant to detect reentrance into fpu_kern_enter/leave, which is prohibited.
Also initialize cpu0's ci_kfpu_spl.
|
Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.76 |
|
24-Oct-2020 |
mgorny |
Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64 use the 64-suffixed variant in order to include the complete FIP/FDP registers in the x87 area.
The difference between the two variants is that the FXSAVE64 (new) variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64), while the legacy FXSAVE variant uses split fields: 32-bit offset, 16-bit segment and 16-bit reserved field (union fp_addr.fa_32). The latter implies that the actual addresses are truncated to 32 bits which is insufficient in modern programs.
The change is applied only to 64-bit programs on amd64. Plain i386 and compat32 continue using plain FXSAVE. Similarly, NVMM is not changed as I am not familiar with that code.
This is a potentially breaking change. However, I don't think it likely to actually break anything because the data provided by the old variant were not meaningful (because of the truncated pointer).
|
#
1.75 |
|
15-Oct-2020 |
mgorny |
Revert "Merge convert_xmm_s87.c into fpu.c"
I am going to add ATF tests for these two functions, and having them in a separate file will make it more convenient to build and run them in userspace.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
branches: 1.55.2; More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.74 |
|
02-Aug-2020 |
riastradh |
Revert "Add kthread_fpu_enter/exit support to x86." for now.
Need to find all the paths out of interrupts back into _kernel_ context to add HANDLE_DEFERRED_FPU, I think, before this can be enabled.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.73 |
|
01-Aug-2020 |
riastradh |
Add kthread_fpu_enter/exit support to x86.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.72 |
|
20-Jul-2020 |
riastradh |
Fix fpu_kern_enter in a softint that interrupted a softint.
We need to find the lwp that was originally interrupted to save its fpu state.
With this, fpu-heavy programs (like firefox) are once again stable, at least under modest stress testing, on systems configured to use wifi with WPA2 and CCMP.
|
#
1.71 |
|
20-Jul-2020 |
riastradh |
Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.
This way fpu_kern_enter/leave cannot interrupt the transition, so the transition from state-on-CPU to state-in-memory (with TS set) is atomic whether in an interrupt or not.
(I am not 100% convinced that this is necessary, but it makes reasoning about the transition simpler.)
|
#
1.70 |
|
20-Jul-2020 |
riastradh |
Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."
This only fixed part of the race, and we can do it more simply.
|
#
1.69 |
|
20-Jul-2020 |
riastradh |
Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."
This didn't actually avoid double-restore, and it doesn't solve the problem anyway, and made it harder to detect in-kernel fpu abuse.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.68 |
|
13-Jul-2020 |
riastradh |
Limit x86 fpu_kern_enter/leave to IPL_VM or below.
There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I know, and although we generally limit the amount of time spent in any one crypto operation -- e.g., cgd is usually limited to processing 512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and IPL_HIGH interrupts at all. This should make ddb a little more accessible during crypto-heavy workloads.
This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the same will go for any new crypto subsystems, like the ChaCha and Poly1305 ones I'm drafting. It might be better to prohibit them altogether in hard interrupt context, but right now cprng_fast and cprng_strong are both technically allowed at IPL_VM and are sometimes used there (e.g., for opencrypto CBC IV generation).
KASSERT the ilevel to detect violation of this constraint in case I'm wrong.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.67 |
|
06-Jul-2020 |
riastradh |
Restore the lwp's fpu state, not zeros, and leave with fpu enabled.
We need to clear the fpu state anyway because it is likely to contain secrets at this point. Previously we set it to zeros, and then issued stts to disable the fpu in order to detect the mistake of further use of the fpu in kernel. But there must be some path I haven't identified yet that doesn't do fpu_handle_deferred, leading to fpudna panics.
In any case, there's no benefit to restoring the fpu state twice (once with zeros and once with the real data). The downside is, although this avoids spurious fpudna traps, using fpu_kern_enter in a softint has the side effect that -- until the next userland context switch triggering stts -- we no longer detect misuse of fpu in the kernel in that lwp. This will serve for now, but we should find another way to issue clts/stts judiciously to detect such misuse.
May improve the continued symptoms of https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html although may not fix everything.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.66 |
|
06-Jul-2020 |
riastradh |
Fix race in fpu save with fpu_kern_enter in softint.
Likely source of:
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.65 |
|
14-Jun-2020 |
riastradh |
Use static constant rather than stack memset buffer for zero fpregs.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.64 |
|
13-Jun-2020 |
riastradh |
Add comments over fpu_kern_enter/leave.
|
#
1.63 |
|
13-Jun-2020 |
riastradh |
Zero the fpu registers on fpu_kern_leave.
Avoid Spectre-class attacks on any values left in them.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.62 |
|
04-Jun-2020 |
riastradh |
Call clts/stts in fpu_kern_enter/leave so they work.
|
Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
branches: 1.60.2; Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.61 |
|
31-Jan-2020 |
maxv |
'oldlwp' is never NULL now, so remove the NULL checks.
|
Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base
|
#
1.60 |
|
27-Nov-2019 |
maxv |
Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-0-RC1 netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.60 |
|
27-Nov-2019 |
maxv |
Add a small API for in-kernel FPU operations.
fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave();
|
Revision tags: phil-wifi-20191119
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.59 |
|
30-Oct-2019 |
maxv |
Style.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.58 |
|
12-Oct-2019 |
maxv |
Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago.
Bump the kernel version to 9.99.16.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.57 |
|
04-Oct-2019 |
maxv |
Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.56 |
|
03-Oct-2019 |
maxv |
Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
|
Revision tags: netbsd-9-base
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.55 |
|
05-Jul-2019 |
maxv |
More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.54 |
|
26-Jun-2019 |
mgorny |
Implement PT_GETXSTATE and PT_SETXSTATE
Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility.
PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets).
PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates.
Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
|
Revision tags: phil-wifi-20190609
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
branches: 1.43.2; Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.53 |
|
25-May-2019 |
maxv |
Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.52 |
|
19-May-2019 |
maxv |
Rename
fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset
Clearer, and reduces a future diff. No real functional change.
|
#
1.51 |
|
19-May-2019 |
maxv |
Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
Revision tags: isaki-audio2-base
|
#
1.50 |
|
11-Feb-2019 |
cherry |
We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
Revision tags: pgoyette-compat-20190127
|
#
1.49 |
|
20-Jan-2019 |
maxv |
Improvements in NVMM
* Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory.
* Hide RDTSCP.
* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
* Take ECX and not RCX on MSR instructions.
|
Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
|
#
1.48 |
|
05-Oct-2018 |
maxv |
export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
|
Revision tags: pgoyette-compat-0930
|
#
1.47 |
|
17-Sep-2018 |
maxv |
Reduce the noise, reorder and rename some things for clarity.
|
Revision tags: pgoyette-compat-0906 pgoyette-compat-0728
|
#
1.46 |
|
01-Jul-2018 |
maxv |
Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes.
|
#
1.45 |
|
01-Jul-2018 |
maxv |
Use a switch, we can (and will) optimize each case separately. No functional change.
|
#
1.44 |
|
29-Jun-2018 |
maxv |
Add more KASSERTs.
Should help PR/53399.
|
Revision tags: phil-wifi-base pgoyette-compat-0625
|
#
1.43 |
|
23-Jun-2018 |
maxv |
Add XXX in fpuinit_mxcsr_mask.
|
#
1.42 |
|
22-Jun-2018 |
maxv |
Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well.
The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones.
|
#
1.41 |
|
20-Jun-2018 |
jdolecek |
as a stop-gap, make fpuinit_mxcsr_mask() for native independant of XSAVE as it should be, only xen case checks the flag now; need to investigate further why exactly the fault happens for the xen no-xsave case
pointed out by maxv
|
#
1.40 |
|
19-Jun-2018 |
jdolecek |
fix FPU initialization on Xen to allow e.g. AVX when supported by hardware; only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be reliable indication
tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag, so should work also on those AMD CPUs, which have XSAVE disabled by default; also tested with Xen DOM0 4.8.3
fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address
XXX pullup netbsd-8
|
#
1.39 |
|
19-Jun-2018 |
maxv |
When using EagerFPU, create the fpu state in execve at IPL_HIGH.
A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state.
The procedure becomes:
splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx
Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle.
In LazyFPU mode we drop IPL_HIGH right away.
Add more KASSERTs.
|
#
1.38 |
|
18-Jun-2018 |
maxv |
Add more KASSERTs, see if they help PR/53383.
|
#
1.37 |
|
17-Jun-2018 |
maxv |
No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy.
I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably.
|
#
1.36 |
|
16-Jun-2018 |
maxv |
Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled.
Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true'].
Also add KASSERTs.
|
#
1.35 |
|
16-Jun-2018 |
maxv |
Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway).
|
#
1.34 |
|
14-Jun-2018 |
maxv |
Install the FPU state on the current CPU in setregs (execve).
|
#
1.33 |
|
14-Jun-2018 |
maxv |
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966.
|
#
1.32 |
|
23-May-2018 |
maxv |
Add a comment about recent AMD CPUs.
|
#
1.31 |
|
23-May-2018 |
maxv |
Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too.
|
#
1.30 |
|
23-May-2018 |
maxv |
Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c.
|
#
1.29 |
|
23-May-2018 |
maxv |
style
|
Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
|
#
1.28 |
|
09-Feb-2018 |
maxv |
branches: 1.28.2; Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.28 |
|
09-Feb-2018 |
maxv |
Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
Revision tags: tls-maxphys-base-20171202
|
#
1.27 |
|
11-Nov-2017 |
maxv |
Recommit
http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html
but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu.
|
#
1.26 |
|
11-Nov-2017 |
bouyer |
Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html, it breaks Xen: http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
branches: 1.12.8; Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.25 |
|
08-Nov-2017 |
maxv |
Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen.
|
#
1.24 |
|
04-Nov-2017 |
maxv |
Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory.
Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP.
fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state.
Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds.
|
#
1.23 |
|
04-Nov-2017 |
maxv |
Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value.
|
#
1.22 |
|
04-Nov-2017 |
maxv |
Fix xen. Not tested, but seems fine enough.
|
#
1.21 |
|
03-Nov-2017 |
maxv |
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ).
|
#
1.20 |
|
31-Oct-2017 |
maxv |
Zero out the buffer entirely.
|
#
1.19 |
|
31-Oct-2017 |
maxv |
Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault.
|
#
1.18 |
|
31-Oct-2017 |
maxv |
Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before.
Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly.
|
#
1.17 |
|
31-Oct-2017 |
maxv |
Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
|
#
1.16 |
|
31-Oct-2017 |
maxv |
Always use x86_fpu_save, clearer.
|
#
1.15 |
|
31-Oct-2017 |
maxv |
Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
#
1.14 |
|
09-Oct-2017 |
maya |
GC i386_fpu_present. no FPU x86 is not supported.
Also delete newly unused send_sigill
|
#
1.13 |
|
17-Sep-2017 |
maxv |
Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore.
|
Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|
Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004
|
#
1.12 |
|
29-Sep-2016 |
maxv |
Remove outdated comments, typos, rename and reorder a few things.
|
Revision tags: localcount-20160914
|
#
1.11 |
|
18-Aug-2016 |
maxv |
Simplify.
|
Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
|
#
1.10 |
|
27-Nov-2014 |
uebayasi |
branches: 1.10.2; 1.10.4; Consistently use kpreempt_*() outside scheduler path.
|
Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
|
#
1.9 |
|
25-Feb-2014 |
dsl |
branches: 1.9.4; 1.9.6; 1.9.10; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
|
#
1.8 |
|
23-Feb-2014 |
dsl |
Add fpu_set_default_cw() and use it in the emulations to set the default x87 control word. This means that nothing outside fpu.c cares about the internals of the fpu save area. New kernel modules won't load with the old kernel - but that won't matter.
|
#
1.7 |
|
23-Feb-2014 |
dsl |
Determine whether the cpu supports xsave (and hence AVX). The result is only written to sysctl nodes at the moment. I see: machdep.fpu_save = 3 (implies xsaveopt) machdep.xsave_size = 832 machdep.xsave_features = 7 Completely common up the i386 and amd64 machdep sysctl creation.
|
#
1.6 |
|
15-Feb-2014 |
dsl |
Load and save the fpu registers (for copies to/from userspace) using helper functions in arch/x86/x86/fpu.c They (hopefully) ensure that we write to the entire buffer and don't load values that might cause faults in kernel. Also zero out the 'pad' field of the i386 mcontext fp area that I think once contained the registers of any Weitek fpu. Dunno why it wasn't pasrt of the union. Some of these copies could be removed if the code directly copied the save area to/from userspace addresses.
|
#
1.5 |
|
15-Feb-2014 |
dsl |
Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
|
#
1.4 |
|
13-Feb-2014 |
dsl |
Check the argument types for the fpu asm functions.
|
#
1.3 |
|
12-Feb-2014 |
dsl |
Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
|
#
1.2 |
|
12-Feb-2014 |
dsl |
Change the argument to fpudna() to be the trapframe. Move the checks for fpu traps in kernel into x86/fpu.c. Remove the code from amd64/trap.c related to fpu traps (they've not gone there for ages - expect to panic in kernel mode). In fpudna(): - Don't actually enable hardware interrupts unless we need to allow in IPIs. - There is no point in enabling them when they are blocked in software (by splhigh()). - Keep the splhigh() to avoid a load of the KASSERTS() firing.
|
#
1.1 |
|
11-Feb-2014 |
dsl |
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h into sys/arch/x86 in preparation for using the same code for i386.
|