History log of /netbsd-current/sys/arch/x86/include/fpu.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.23 24-Oct-2020 mgorny

Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs

When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
use the 64-suffixed variant in order to include the complete FIP/FDP
registers in the x87 area.

The difference between the two variants is that the FXSAVE64 (new)
variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
while the legacy FXSAVE variant uses split fields: 32-bit offset,
16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
The latter implies that the actual addresses are truncated to 32 bits
which is insufficient in modern programs.

The change is applied only to 64-bit programs on amd64. Plain i386
and compat32 continue using plain FXSAVE. Similarly, NVMM is not
changed as I am not familiar with that code.

This is a potentially breaking change. However, I don't think it likely
to actually break anything because the data provided by the old variant
were not meaningful (because of the truncated pointer).


# 1.22 15-Oct-2020 mgorny

Revert "Merge convert_xmm_s87.c into fpu.c"

I am going to add ATF tests for these two functions, and having them
in a separate file will make it more convenient to build and run them
in userspace.


# 1.21 14-Jun-2020 riastradh

Use static constant rather than stack memset buffer for zero fpregs.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.20 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


Revision tags: phil-wifi-20191119
# 1.19 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.18 04-Oct-2019 maxv

Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.


Revision tags: netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.21 14-Jun-2020 riastradh

Use static constant rather than stack memset buffer for zero fpregs.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base
# 1.20 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


Revision tags: phil-wifi-20191119
# 1.19 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.18 04-Oct-2019 maxv

Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.20 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


Revision tags: phil-wifi-20191119
# 1.19 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.18 04-Oct-2019 maxv

Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.


Revision tags: netbsd-9-base
# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.19 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.18 04-Oct-2019 maxv

Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.


Revision tags: netbsd-9-base
# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.18 04-Oct-2019 maxv

Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.


Revision tags: netbsd-9-base
# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.17 26-Jun-2019 mgorny

Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).


Revision tags: phil-wifi-20190609
# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


# 1.16 19-May-2019 maxv

Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.


# 1.15 19-May-2019 maxv

Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.


Revision tags: isaki-audio2-base pgoyette-compat-20190127
# 1.14 20-Jan-2019 maxv

Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.


Revision tags: pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020
# 1.13 05-Oct-2018 maxv

export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore


Revision tags: pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.12 22-Jun-2018 maxv

Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.11 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.10 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


# 1.9 14-Jun-2018 maxv

Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.


# 1.8 23-May-2018 maxv

Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202
# 1.7 03-Nov-2017 maxv

branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).


Revision tags: netbsd-7-2-RELEASE netbsd-8-0-RC1 netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 netbsd-7-1-RELEASE netbsd-7-1-RC2 nick-nhusb-base-20170204 netbsd-7-nhusb-base-20170116 bouyer-socketcan-base pgoyette-localcount-20170107 netbsd-7-1-RC1 nick-nhusb-base-20161204 pgoyette-localcount-20161104 netbsd-7-0-2-RELEASE nick-nhusb-base-20161004 localcount-20160914 netbsd-7-nhusb-base pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base tls-maxphys-base netbsd-7-base rmind-smpnet-base rmind-smpnet-nbase yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.6 25-Feb-2014 dsl

branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.5 23-Feb-2014 dsl

Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.


# 1.4 15-Feb-2014 dsl

Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.


# 1.3 15-Feb-2014 dsl

Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).


# 1.2 12-Feb-2014 dsl

Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).


# 1.1 11-Feb-2014 dsl

Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.