History log of /linux-master/arch/s390/kernel/fpu.c
Revision Date Author Comments
# 8c09871a 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: limit save and restore to used registers

The first invocation of kernel_fpu_begin() after switching from user to
kernel context will save all vector registers, even if only parts of the
vector registers are used within the kernel fpu context. Given that save
and restore of all vector registers is quite expensive change the current
approach in several ways:

- Instead of saving and restoring all user registers limit this to those
registers which are actually used within an kernel fpu context.

- On context switch save all remaining user fpu registers, so they can be
restored when the task is rescheduled.

- Saving user registers within kernel_fpu_begin() is done without disabling
and enabling interrupts - which also slightly reduces runtime. In worst
case (e.g. interrupt context uses the same registers) this may lead to
the situation that registers are saved several times, however the
assumption is that this will not happen frequently, so that the new
method is faster in nearly all cases.

- save_user_fpu_regs() can still be called from all contexts and saves all
(or all remaining) user registers to a tasks ufpu user fpu save area.

Overall this reduces the time required to save and restore the user fpu
context for nearly all cases.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 066c4091 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: decrease stack usage for some cases

The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.

There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# bdbd3acb 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: remove anonymous union from struct fpu

The anonymous union within struct fpu contains a floating point register
array and a vector register array. Given that the vector register is always
present remove the floating point register array. For configurations
without vector registers save the floating point register contents within
their corresponding vector register location.

This allows to remove the union, and also to simplify ptrace and perf code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 9cbff7f2 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: remove regs member from struct fpu

KVM was the only user which modified the regs pointer in struct fpu. Remove
the pointer and convert the rest of the core fpu code to directly access
the save area embedded within struct fpu.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# c038b984 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: change type of fpu mask from u32 to int

Change type of fpu mask consistently from u32 to int. This is a
prerequisite to make the kernel fpu usage preemptible. Upcoming code
uses __atomic* ops which work with int pointers.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 87c5c700 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: rename save_fpu_regs() to save_user_fpu_regs(), etc

Rename save_fpu_regs(), load_fpu_regs(), and struct thread_struct's fpu
member to save_user_fpu_regs(), load_user_fpu_regs(), and ufpu. This way
the function and variable names reflect for which context they are supposed
to be used.

This large and trivial conversion is a prerequisite for making the kernel
fpu usage preemptible.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 419abc4d 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: convert FPU CIF flag to regular TIF flag

The FPU state, as represented by the CIF_FPU flag reflects the FPU state of
a task, not the CPU it is running on. Therefore convert the flag to a
regular TIF flag.

This removes the magic in switch_to() where a save_fpu_regs() call for the
currently (previous) running task sets the per-cpu CIF_FPU flag, which is
required to restore FPU register contents of the next task, when it returns
to user space.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 918c7cad 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: convert __kernel_fpu_begin()/__kernel_fpu_end() to C

Convert the rather large __kernel_fpu_begin()/__kernel_fpu_end() inline
assemblies to C. The C variant is much more readable, and this also allows
to get rid of the non-obvious usage of KERNEL_VXR_* constants within the
inline assemblies. E.g. "tmll %[m],6" correlates with the two bits set in
KERNEL_VXR_LOW. If the corresponding defines would be changed, the inline
assembles would break in a subtle way.

Therefore convert to C, use the proper defines, and allow the compiler to
generate code using the (hopefully) most efficient instructions.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 3a5866a0 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: provide and use vlm and vstm inline assemblies

Instead of open-coding vlm and vstm inline assemblies at several locations,
provide an fpu_* function for each instruction, and use them in the new
save_vx_regs() and load_vx_regs() helper functions.

Note that "O" and "R" inline assembly operand modifiers are used in order
to pass the displacement and base register of the memory operands to the
existing VLM and VSTM macros. The two operand modifiers are not available
for clang. Therefore provide two variants of each inline assembly.

The clang variant always uses and clobbers general purpose register 1, like
in the previous inline assemblies, so it can be used as base register with
a zero displacement. This generates slightly less efficient code, but can
be removed as soon as clang has support for the used operand modifiers.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# f4e3de75 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: provide and use lfpc, sfpc, and stfpc inline assemblies

Instead of open-coding lfpc, sfpc, and stfpc inline assemblies at
several locations, provide an fpu_* function for each instruction and
use the function instead.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 88d8136a 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: provide and use ld and std inline assemblies

Deduplicate the 64 ld and std inline assemblies. Provide an fpu inline
assembly for both instructions, and use them in the new save_fp_regs()
and load_fp_regs() helper functions.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 13a8a519 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: use lfpc instead of sfpc instruction

The only user of sfpc_safe() needs to read the new fpc register value
from memory before it is set with sfpc.

Avoid this indirection and use lfpc, which reads the new value from
memory. Also add the "fpu_" prefix to have a common name space for fpu
related inline assemblies, and provide memory access instrumentation.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# fd2527f2 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: move, rename, and merge header files

Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).

Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.

Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 31d3ec15 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: various coding style changes

Address various checkpatch warnings, adjust whitespace, and try to increase
readability. This is just preparation, in order to avoid that subsequent
patches contain any distracting drive-by coding style changes.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# b6b842be 03-Feb-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: use KERNEL_VXR_LOW instead of KERNEL_VXR_V0V7

Use KERNEL_VXR_LOW instead of KERNEL_VXR_V0V7 for configurations without
vector registers in order to decide if floating point registers need to be
saved and restored.

Kernel FPU areas which use floating point registers are supposed to use the
KERNEL_FPR mask, however users may also open-code this and specify
KERNEL_VXR_V0V7 and/or KERNEL_VXR_V8V15. If only KERNEL_VXR_V8V15 is
specified floating point registers wouldn't be saved and restored. Improve
this and check for both bits.

There are currently no users where this would fix a bug.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 74ca8961 05-Jan-2024 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: remove __load_fpu_regs() export

__load_fpu_regs() is only called from core kernel code.
Therefore remove the not needed export.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>


# 18564756 01-Dec-2023 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: get rid of MACHINE_HAS_VX

Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx() which is a
short readable wrapper for "test_facility(129)".

Facility bit 129 is set if the vector facility is present. test_facility()
returns also true for all bits which are set in the architecture level set
of the cpu that the kernel is compiled for. This means that
test_facility(129) is a compile time constant which returns true for z13
and later, since the vector facility bit is part of the z13 kernel ALS.

In result the compiled code will have less runtime checks, and less code.

Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>


# 70264424 30-Nov-2023 Heiko Carstens <hca@linux.ibm.com>

s390/fpu: get rid of test_fp_ctl()

It is quite subtle to use test_fp_ctl() correctly. Therefore remove it -
instead copy whatever new floating point control (fpc) register values are
supposed to be used into its save area.

Test the validity of the new value when loading it. If the new value is
invalid, load the fpc register with zero.

This seems to be a the best way to approach this problem. Even though this
changes behavior:

- sigreturn with an invalid fpc value on the stack will succeed, and
continue with zero value, instead of returning with SIGSEGV

- ptraced processes will also use a zero value instead of letting the
request fail with -EINVAL

However all of this seems to acceptable. After all testing of the value was
only implemented to avoid that user space can crash the kernel. It is not
there to test values for validity; and the assumption is that there is no
existing user space which is doing this.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>


# 706f2ada 01-Dec-2022 Heiko Carstens <hca@linux.ibm.com>

s390/vx: add vx-insn.h wrapper include file

The vector instruction macros can also be used in inline assemblies. For
this the magic

asm(".include \"asm/vx-insn.h\"\n");

must be added to C files in order to avoid that the pre-processor
eliminates the __ASSEMBLY__ guarded macros. This however comes with the
problem that changes to asm/vx-insn.h do not cause a recompile of C files
which have only this magic statement instead of a proper include statement.
This can be observed with the arch/s390/kernel/fpu.c file.

In order to fix this problem and also to avoid that the include must
be specified twice, add a wrapper include header file which will do
all necessary steps.

This way only the vx-insn.h header file needs to be included and changes to
the new vx-insn-asm.h header file cause a recompile of all dependent files
like it should.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>


# 56e62a73 21-Nov-2020 Sven Schnelle <svens@linux.ibm.com>

s390: convert to generic entry

This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.

There are a few special things on s390:

- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
know about our PIF flags in exit_to_user_mode_loop().

- The old code had several ways to restart syscalls:

a) PIF_SYSCALL_RESTART, which was only set during execve to force a
restart after upgrading a process (usually qemu-kvm) to pgste page
table extensions.

b) PIF_SYSCALL, which is set by do_signal() to indicate that the
current syscall should be restarted. This is changed so that
do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
PIF_SYSCALL doesn't work with the generic code, and changing it
to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
more unique.

- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.

The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.

CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 35af0d46 14-Apr-2019 Vasily Gorbik <gor@linux.ibm.com>

s390: correct some inline assembly constraints

Inline assembly code changed in this patch should really use "Q"
constraint "Memory reference without index register and with short
displacement". The kernel build with kasan instrumentation enabled
might occasionally break otherwise (due to stack instrumentation).

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7f79695c 21-Aug-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/fpu: improve kernel_fpu_[begin|end]

In case of nested user of the FPU or vector registers in the kernel
the current code uses the mask of the FPU/vector registers of the
previous contexts to decide which registers to save and restore.
E.g. if the previous context used KERNEL_VXR_V0V7 and the next
context wants to use KERNEL_VXR_V24V31 the first 8 vector registers
are stored to the FPU state structure. But this is not necessary
as the next context does not use these registers.

Rework the FPU/vector register save and restore code. The new code
does a few things differently:
1) A lowcore field is used instead of a per-cpu variable.
2) The kernel_fpu_end function now has two parameters just like
kernel_fpu_begin. The register flags are required by both
functions to save / restore the minimal register set.
3) The inline functions kernel_fpu_begin/kernel_fpu_end now do the
update of the register masks. If the user space FPU registers
have already been stored neither save_fpu_regs nor the
__kernel_fpu_begin/__kernel_fpu_end functions have to be called
for the first context. In this case kernel_fpu_begin adds 7
instructions and kernel_fpu_end adds 4 instructions.
3) The inline assemblies in __kernel_fpu_begin / __kernel_fpu_end
to save / restore the vector registers are simplified a bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 04864808 18-Feb-2015 Hendrik Brueckner <brueckner@linux.vnet.ibm.com>

s390/vx: add support functions for in-kernel FPU use

Introduce the kernel_fpu_begin() and kernel_fpu_end() function
to enclose any in-kernel use of FPU instructions and registers.
In enclosed sections, you can perform floating-point or vector
(SIMD) computations. The functions take care of saving and
restoring FPU register contents and controls.

For usage details, see the guidelines in arch/s390/include/asm/fpu/api.h

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>