#
64c34318 |
|
13-Mar-2024 |
Sven Schnelle <svens@linux.ibm.com> |
s390/entry: compare gmap asce to determine guest/host fault With the current implementation, there are some cornercases where a host fault would be treated as a guest fault, for example when the sie instruction causes a program check. Therefore store the gmap asce in ptregs, and use that to compare the primary asce from the fault instead of matching instruction addresses. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
29e5bc0f |
|
20-Feb-2024 |
Sven Schnelle <svens@linux.ibm.com> |
s390/entry: remove OUTSIDE macro With only one OUTSIDE user left, remove the macro and move the code directly to the machine check handler. This has the advantage that it is much easier to determine which registers are used. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
c239c83e |
|
20-Feb-2024 |
Sven Schnelle <svens@linux.ibm.com> |
s390/entry: add CIF_SIE flag and remove sie64a() address check When a program check, interrupt or machine check is triggered, the PSW address is compared to a certain range of the sie64a() function to figure out whether SIE was interrupted and a cleanup of SIE is needed. This doesn't work with kprobes: If kprobes probes an instruction, it copies the instruction to the kprobes instruction page and overwrites the original instruction with an undefind instruction (Opcode 00). When this instruction is hit later, kprobes single-steps the instruction on the kprobes_instruction page. However, if this instruction is a relative branch instruction it will now point to a different location in memory due to being moved to the kprobes instruction page. If the new branch target points into sie64a() the kernel assumes it interrupted SIE when processing the breakpoint and will crash trying to access the SIE control block. Instead of comparing the address, introduce a new CIF_SIE flag which indicates whether SIE was interrupted. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
ed3a0a01 |
|
03-Feb-2024 |
Heiko Carstens <hca@linux.ibm.com> |
s390/kvm: convert to regular kernel fpu user KVM modifies the kernel fpu's regs pointer to its own area to implement its custom version of preemtible kernel fpu context. With general support for preemptible kernel fpu context there is no need for the extra complexity in KVM code anymore. Therefore convert KVM to a regular kernel fpu user. In particular this means that all TIF_FPU checks can be removed, since the fpu register context will never be changed by other kernel fpu users, and also the fpu register context will be restored if a thread is preempted. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
419abc4d |
|
03-Feb-2024 |
Heiko Carstens <hca@linux.ibm.com> |
s390/fpu: convert FPU CIF flag to regular TIF flag The FPU state, as represented by the CIF_FPU flag reflects the FPU state of a task, not the CPU it is running on. Therefore convert the flag to a regular TIF flag. This removes the magic in switch_to() where a save_fpu_regs() call for the currently (previous) running task sets the per-cpu CIF_FPU flag, which is required to restore FPU register contents of the next task, when it returns to user space. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
fd2527f2 |
|
03-Feb-2024 |
Heiko Carstens <hca@linux.ibm.com> |
s390/fpu: move, rename, and merge header files Move, rename, and merge the fpu and vx header files. This way fpu header files have a consistent naming scheme (fpu*.h). Also get rid of the fpu subdirectory and move header files to asm directory, so that all fpu and vx header files can be found at the same location. Merge internal.h header file into other header files, since the internal helpers are used at many locations. so those helper functions are really not internal. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
9e96afab |
|
03-Feb-2024 |
Heiko Carstens <hca@linux.ibm.com> |
s390/nmi: remove register validation code Remove the historic machine check handler code which validates registers. Registers are automatically validated as part of the machine check handling sequence (see Principles of Operation, Machine-Check Handling chapter, Validation). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
340750c1 |
|
05-Feb-2024 |
Heiko Carstens <hca@linux.ibm.com> |
s390/switch_to: use generic header file Move the switch_to() implementation to process.c and use the generic switch_to.h header file instead, like some other architectures. This addresses also the oddity that the old switch_to() implementation assigns the return value of __switch_to() to 'prev' instead of 'last', like it should. Remove also all includes of switch_to.h from C files, except process.c. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
b8c723f1 |
|
06-Aug-2023 |
Masahiro Yamada <masahiroy@kernel.org> |
s390: replace #include <asm/export.h> with #include <linux/export.h> Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>. Replace #include <asm/export.h> with #include <linux/export.h>. After all the <asm/export.h> lines are converted, <asm/export.h> and <asm-generic/export.h> will be removed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20230806151641.394720-2-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
efccd4e0 |
|
29-Jun-2023 |
Sven Schnelle <svens@linux.ibm.com> |
s390/entry: remove mcck clock In the past machine checks where accounted as irq time. With the conversion to generic entry, it was decided to account machine checks to the current context. The stckf at the beginning of the machine check handler and the lowcore member is no longer required, therefore remove it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
#
edbe2898 |
|
18-Apr-2023 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/entry: rework entering DAT-on mode on CPU restart Instead of enforcing PSW_MASK_DAT bit on previously stored in lowcore restart_psw.mask use the PSW_KERNEL_BITS mask (which contains PSW_MASK_DAT) directly. As result, the PSW mask stored in lowcore is only used to enter the CPU restart routine, while PSW_KERNEL_BITS is used to enter the kernel code - similarily to commit 64ea2977add2 ("s390/mm: start kernel with DAT enabled"). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
#
27d45655 |
|
22-Jun-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390: consistently use .balign instead of .align The .align directive has inconsistent behavior across architectures. Use .balign instead everywhere. This is a no-op for s390, but with this there is no mix in using .align and .balign anymore. Future code is supposed to use only .balign. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
#
fda1dffa |
|
17-Apr-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: use SYM* macros instead of ENTRY(), etc. Consistently use the SYM* family of macros instead of the deprecated ENTRY(), ENDPROC(), etc. family of macros. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
b94c0ebb |
|
27-Mar-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390: enable HAVE_ARCH_STACKLEAK Add support for the stackleak feature. Whenever the kernel returns to user space the kernel stack is filled with a poison value. Enabling this feature is quite expensive: e.g. after instrumenting the getpid() system call function to have a 4kb stack the result is an increased runtime of the system call by a factor of 3. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
c2c3258f |
|
27-Mar-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/stack: use STACK_INIT_OFFSET where possible Make STACK_INIT_OFFSET also available for assembler code, and use it everywhere instead of open-coding it at several places. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
385bf43c |
|
28-Mar-2023 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/entry: rely on long-displacement facility Since commit 4efd417f298b ("s390: raise minimum supported machine generation to z10"), the long-displacement facility is assumed and required for the kernel. Clean up a couple of places in the entry code, where long-displacement could be used directly instead of using a base register. However, there are still a few other places where a base register has to be used to extend short-displacement for the second lowcore page access. Notably, boot/head.S still has to be built for z900, and in mcck_int_handler, spt and lbear, which don't have long-displacements, but need to access save areas at the second lowcore page. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
69a407bf |
|
28-Feb-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/bp: remove __bpon() There is no point in changing branch prediction state of a cpu shortly before it enters stop state. Therefore remove __bpon(). Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
f33f2d4c |
|
28-Feb-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/bp: remove TIF_ISOLATE_BP TIF_ISOLATE_BP is unused since it was introduced with commit 6b73044b2b00 ("s390: run user space and KVM guests with modified branch prediction"). Given that there is no use case remove it again. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
fed626db |
|
28-Feb-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/bp: add missing BPENTER to program check handler When leaving interpretive execution because of a program check BPENTER should be called like it is done on interrupt exit as well. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
e7ec1d2e |
|
17-Dec-2022 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: cleanup user process termination path If a machine check interrupt hits while user process is running __s390_handle_mcck() helper function is called directly from the interrupt handler and terminates the current process by calling make_task_dead() routine. The make_task_dead() is not allowed to be called from interrupt context which forces the machine check handler switch to the kernel stack and enable local interrupts first. The __s390_handle_mcck() could also be called to service pending work, but this time from the external interrupts handler. It is the machine check handler that establishes the work and schedules the external interrupt, therefore the machine check interrupt itself should be disabled while reading out the corresponding variable: local_mcck_disable(); mcck = *this_cpu_ptr(&cpu_mcck); memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck)); local_mcck_enable(); However, local_mcck_disable() does not have effect when __s390_handle_mcck() is called directly form the machine check handler, since the machine check interrupt is still disabled. Therefore, it is not the opening bracket to the following local_mcck_enable() function. Simplify the user process termination flow by scheduling the external interrupt and killing the affected process from the interrupt context. Assume a kernel-generated signal is always delivered and ignore a value returned by do_send_sig_info() funciton. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
adf1e17e |
|
12-Feb-2023 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: remove toolchain dependent micro-optimization Get rid of CONFIG_AS_IS_LLVM in entry.S to make the code a bit more readable. This removes a micro-optimization, but given that the llvm IAS limitation will likely stay, just use the version that works with llvm. See commit 4c25f0ff6336 ("s390/entry: workaround llvm's IAS limitations") for further details. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
6b33e68a |
|
20-Oct-2022 |
Nico Boehr <nrb@linux.ibm.com> |
s390/entry: sort out physical vs virtual pointers usage in sie64a Fix virtual vs physical address confusion (which currently are the same). sie_block is accessed in entry.S and passed it to hardware, which is why both its physical and virtual address are needed. To avoid every caller having to do the virtual-physical conversion, add a new function sie64a() which converts the virtual address to physical. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20221020143159.294605-3-nrb@linux.ibm.com Message-Id: <20221020143159.294605-3-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
#
742aed05 |
|
29-Nov-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/nmi: move storage error checking back to C, enter with DAT on Checking for storage errors in machine check entry code was done in order to handle also storage errors on kernel page tables. However this is extremely unlikely and some basic assumptions what works on machine check entry are necessary anyway. In order to simplify machine check handling delay checking for storage errors to C code. With this also change the machine check new PSW to have DAT on, which simplifies the entry code even further. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
#
29ccaa4b |
|
22-May-2022 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag Commit d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel") introduced .Lsie_exit label - supposedly to fence off SIE instruction. However, the corresponding address range length .Lsie_crit_mcck_length was not updated, which led to BPON code potentionally marked with CIF_MCCK_GUEST flag. Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S"), but the issue persisted - currently BPOFF and BPENTER macros might get wrongly considered by the machine check handler as a guest. Fixes: d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
3384f135 |
|
23-May-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390: generate register offsets into pt_regs automatically Use asm offsets method to generate register offsets into pt_regs, instead of open-coding at several places. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
4c25f0ff |
|
11-May-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: workaround llvm's IAS limitations llvm's integrated assembler cannot handle immediate values which are calculated with two local labels: <instantiation>:3:13: error: invalid operand for instruction clgfi %r14,.Lsie_done - .Lsie_gmap Workaround this by adding clang specific code which reads the specific value from memory. Since this code is within the hot paths of the kernel and adds an additional memory reference, keep the original code, and add ifdef'ed code. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20220511120532.2228616-5-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
fad442d3 |
|
11-May-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/alternatives: provide identical sized orginal/alternative sequences Explicitly provide identical sized original/alternative instruction sequences. This way there is no need for the s390 specific alternatives infrastructure to generate padding sequences. The code which generates such sequences will be removed with a follow on patch. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220511120532.2228616-2-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
39d62336 |
|
04-May-2022 |
Thomas Richter <tmricht@linux.ibm.com> |
s390/pai: add support for cryptography counters PMU device driver perf_pai_crypto supports Processor Activity Instrumentation (PAI), available with IBM z16: - maps a full page to lowcore address 0x1500. - uses CR0 bit 13 to turn PAI crypto counting on and off. - creates a sample with raw data on each context switch out when at context switch some mapped counters have a value of nonzero. This device driver only supports CPU wide context, no task context is allowed. Support for counting: - one or more counters can be specified using perf stat -e pai_crypto/xxx/ where xxx stands for the counter event name. Multiple invocation of this command is possible. The counter names are listed in /sys/devices/pai_crypto/events directory. - one special counters can be specified using perf stat -e pai_crypto/CRYPTO_ALL/ which returns the sum of all incremented crypto counters. - one event pai_crypto/CRYPTO_ALL/ is reserved for sampling. No multiple invocations are possible. The event collects data at context switch out and saves them in the ring buffer. Add qpaci assembly instruction to query supported memory mapped crypto counters. It returns the number of counters (no holes allowed in that range). The PAI crypto counter events are system wide and can not be executed in parallel. Therefore some restrictions documented in function paicrypt_busy apply. In particular event CRYPTO_ALL for sampling must run exclusive. Only counting events can run in parallel. PAI crypto counter events can not be created when a CPU hot plug add is processed. This means a CPU hot plug add does not get the necessary PAI event to record PAI cryptography counter increments on the newly added CPU. CPU hot plug remove removes the event and terminates the counting of PAI counters immediately. Co-developed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20220504062351.2954280-3-tmricht@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
29b06ad7 |
|
03-May-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: remove broken and not needed code LLVM's integrated assembler reports the following error when compiling entry.S: <instantiation>:38:5: error: unknown token in expression tm %r8,0x0001 # coming from user space? The correct instruction would have been tmhh instead of tm. The current code is doing nothing, since (with gas) it get's translated to a tm instruction which reads from real address 8, which again contains always zero, and therefore the conditional code is never executed. Note that due to the missing displacement gas translates "%r8" into "8(%r0)". Also code inspection reveals that this conditional code is not needed. Therefore remove it. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
6982dba1 |
|
13-Mar-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/alternatives: use insn format for new instructions Use insn format with instruction format specifier instead of plain longs. This way it is also more obvious that code instead of data is generated. The generated code is identical. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
731efc96 |
|
25-Feb-2022 |
Vasily Gorbik <gor@linux.ibm.com> |
s390: convert ".insn" encoding to instruction names With z10 as minimum supported machine generation many ".insn" encodings could be now converted to instruction names. There are couple of exceptions - stfle is used from the als code built for z900 and cannot be converted - few ".insn" directives encode unsupported instruction formats The generated code is identical before/after this change. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
10bc15ba |
|
25-Feb-2022 |
Vasily Gorbik <gor@linux.ibm.com> |
s390: assume stckf is always present With z10 as minimum supported machine generation the store-clock-fast facility (25) is always present and checked in als code. Drop alternatives and always use stckf. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
d09a307f |
|
28-Feb-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390/extable: move EX_TABLE define to asm-extable.h Follow arm64 and riscv and move the EX_TABLE define to asm-extable.h which is a lot less generic than the current linkage.h. Also make sure that all files which contain EX_TABLE usages actually include the new header file. This should make sure that the files always compile and there won't be any random compile breakage due to other header file dependencies. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
f0003a9e |
|
25-Feb-2022 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/entry: remove unused expoline thunk Remove __s390_indirect_jump_r13use_r14 expoline thunk unused since commit fbbdfca5c553 ("s390/entry.S: factor out SIEEXIT macro"). Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
f36e7c98 |
|
27-Jan-2022 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove invalid email address of Heiko Carstens Remove my old invalid email address which can be found in a couple of files. Instead of updating it, just remove my contact data completely from source files. We have git and other tools which allow to figure out who is responsible for what with recent contact data. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
3b051e89 |
|
07-Apr-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: add support for BEAR enhancement facility The Breaking-Event-Address-Register (BEAR) stores the address of the last breaking event instruction. Breaking events are usually instructions that change the program flow - for example branches, and instructions that modify the address in the PSW like lpswe. This is useful for debugging wild branches, because one could easily figure out where the wild branch was originating from. What is problematic is that lpswe is considered a breaking event, and therefore overwrites BEAR on kernel exit. The BEAR enhancement facility adds new instructions that allow to save/restore BEAR and also an lpswey instruction that doesn't cause a breaking event. So we can save BEAR on kernel entry and restore it on exit to user space. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
15256194 |
|
27-Aug-2021 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: make oklabel within CHKSTG macro local Make the oklabel within the CHKSTG macro local. This makes sure that tools like objdump and the crash debugging tool still disassemble full functions where the macro has been used instead of stopping half way where such a global label is used and one has to guess how to disassemble the rest of such a function: E.g.: 0000000000cb0270 <mcck_int_handler>: cb0270: b2 05 03 20 stck 800 ... cb0354: a7 74 00 97 jne cb0482 <oklabel270+0xe2> 0000000000cb0358 <oklabel243>: cb0358: c0 e0 00 22 4e 8f larl %r14,10fa076 <opcode+0x2558> ... Fixes: d35925b34996 ("s390/mcck: move storage error checks to assembler") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
915fea04 |
|
24-Aug-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/smp: enable DAT before CPU restart callback is called The restart interrupt is triggered whenever a secondary CPU is brought online, a remote function call dispatched from another CPU or a manual PSW restart is initiated and causes the system to kdump. The handling routine is always called with DAT turned off. It then initializes the stack frame and invokes a callback. The existing callbacks handle DAT as follows: * __do_restart() and __machine_kexec() turn in on upon entry; * __ipl_run(), __reipl_run() and __dump_run() do not turn it right away, but all of them call diag308() - which turns DAT on, but only if kasan is enabled; In addition to the described complexity all callbacks (and the functions they call) should avoid kasan instrumentation while DAT is off. This update enables DAT in the assembler restart handler and relieves any callbacks (which are mostly C functions) from dealing with DAT altogether. There are four types of CPU restart that initialize control registers in different ways: 1. Start of secondary CPU on boot - control registers are inherited from the IPL CPU; 2. Restart of online CPU - control registers of the CPU being restarted are kept; 3. Hotplug of offline CPU - control registers are inherited from the starting CPU; 4. Start of offline CPU triggered by manual PSW restart - the control registers are read from the absolute lowcore and contain the boot time IPL CPU values updated with all follow-up calls of smp_ctl_set_bit() and smp_ctl_clear_bit() routines; In first three cases contents of the control registers is the most recent. In the latter case control registers are good enough to facilitate successful completion of kdump operation. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
5fa2ea07 |
|
18-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: move register validation to C code This update partially reverts commit 3037a52f9846 ("s390/nmi: do register validation as early as possible"). Storage error checks and control registers validation are left in the assembler code, since correct ASCEs and page tables are required to enable DAT - which is done before the C handler is entered. System damage, kernel instruction address and PSW MWP checks are left in the assembler code as well, since there is no way to proceed if one of these checks is failed. The getcpu vdso syscall reads CPU number from the programmable field of the TOD clock. Disregard the TOD programmable register validity bit and load the CPU number into the TOD programmable field unconditionally. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
d35925b3 |
|
18-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: move storage error checks to assembler The current storage errors tackling is wrong - the DAT is enabled in assembler code before the actual storage checks in C half are executed. In case the page tables themselves are damaged such approach is not going to work. With this update unrecoverable storage errors are not passed to C code for handling, but rather the machine is stopped right away. The only exception to this flow is when a machine check occurred in KVM guest - in this case the errors are reinjected by the handler. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
7f6dc8d4 |
|
18-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: always enter C handler with DAT enabled The machine check handler must be entered with DAT disabled in case control registers are corrupted or a storage error happened and we can not tell if such error corresponds to a page table. Both of described conditions end up in stopping all CPUs and entering the disabled wait in C half of the handler. However, the storage errors are still checked after the DAT is enabled and C code is entered. In case a page table is damaged such flow is not expected to work. This update paves the way for moving the storage error checks from C to assembler half. All fatal errors that can only be handled with DAT disabled are handled in assembler half also. As result, the C half is only entered if the DAT is secured. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
e2c13d64 |
|
22-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: optimize user mode check in case of !CONFIG_KVM In case of the !CONFIG_KVM use "jz" instead of "jnz" when detecting user mode and get rid of unnecessary jump as result. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
fbbdfca5 |
|
18-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/entry.S: factor out SIEEXIT macro Factor out SIEEXIT macro and use it instead of cleanup_sie routine. As a side effect %r13 and %r14 are spared. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
b5415c8f |
|
06-Jun-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/entry.S: factor out OUTSIDE macro Introduce OUTSIDE macro that checks whether an instruction address is inside or outside of a block of instructions. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
20232b18 |
|
17-May-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: cleanup use of cleanup_sie_mcck cleanup_sie_mcck label is called from a single location only and thus does not need to be a subroutine. Move the labelled code to the caller - by doing that the SIE critical section checks appear next to each other and the SIE cleanup becomes bit more readable. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
af9ad822 |
|
05-May-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390/entry: use assignment to read intcode / asm to copy gprs arch/s390/kernel/syscall.c: In function __do_syscall: arch/s390/kernel/syscall.c:147:9: warning: memcpy reading 64 bytes from a region of size 0 [-Wstringop-overread] 147 | memcpy(®s->gprs[8], S390_lowcore.save_area_sync, 8 * sizeof(unsigned long)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/s390/kernel/syscall.c:148:9: warning: memcpy reading 4 bytes from a region of size 0 [-Wstringop-overread] 148 | memcpy(®s->int_code, &S390_lowcore.svc_ilc, sizeof(regs->int_code)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by moving the gprs restore from C to assembly, and use a assignment for int_code instead of memcpy. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
ca1f4d70 |
|
11-Jun-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: clear pt_regs::flags on irq entry The current irq entry code doesn't initialize pt_regs::flags. On exit to user mode arch_do_signal_or_restart() tests whether PIF_SYSCALL is set, which might yield wrong results. Fix this by clearing pt_regs::flags in the entry.S irq handler code. Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Fixes: 56e62a737028 ("s390: convert to generic entry") Cc: <stable@vger.kernel.org> # 5.12 Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
1874cb13 |
|
17-May-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: fix invalid KVM guest condition check Wrong condition check is used to decide if a machine check hit while in KVM guest. As result of this check the instruction following the SIE critical section might be considered as still in KVM guest and _CIF_MCCK_GUEST CPU flag mistakenly set as result. Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
5bcbe328 |
|
17-May-2021 |
Alexander Gordeev <agordeev@linux.ibm.com> |
s390/mcck: fix calculation of SIE critical section size The size of SIE critical section is calculated wrongly as result of a missed subtraction in commit 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
a994eddb |
|
08-Apr-2021 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/entry: save the caller of psw_idle Currently psw_idle does not allocate a stack frame and does not save its r14 and r15 into the save area. Even though this is valid from call ABI point of view, because psw_idle does not make any calls explicitly, in reality psw_idle is an entry point for controlled transition into serving interrupts. So, in practice, psw_idle stack frame is analyzed during stack unwinding. Depending on build options that r14 slot in the save area of psw_idle might either contain a value saved by previous sibling call or complete garbage. [task 0000038000003c28] do_ext_irq+0xd6/0x160 [task 0000038000003c78] ext_int_handler+0xba/0xe8 [task *0000038000003dd8] psw_idle_exit+0x0/0x8 <-- pt_regs ([task 0000038000003dd8] 0x0) [task 0000038000003e10] default_idle_call+0x42/0x148 [task 0000038000003e30] do_idle+0xce/0x160 [task 0000038000003e70] cpu_startup_entry+0x36/0x40 [task 0000038000003ea0] arch_call_rest_init+0x76/0x80 So, to make a stacktrace nicer and actually point for the real caller of psw_idle in this frequently occurring case, make psw_idle save its r14. [task 0000038000003c28] do_ext_irq+0xd6/0x160 [task 0000038000003c78] ext_int_handler+0xba/0xe8 [task *0000038000003dd8] psw_idle_exit+0x0/0x6 <-- pt_regs ([task 0000038000003dd8] arch_cpu_idle+0x3c/0xd0) [task 0000038000003e10] default_idle_call+0x42/0x148 [task 0000038000003e30] do_idle+0xce/0x160 [task 0000038000003e70] cpu_startup_entry+0x36/0x40 [task 0000038000003ea0] arch_call_rest_init+0x76/0x80 Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
b74e409e |
|
08-Apr-2021 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/entry: avoid setting up backchain in ext|io handlers Currently when interrupt arrives to cpu while in kernel context INT_HANDLER macro (used for ext_int_handler and io_int_handler) allocates new stack frame and pt_regs on the kernel stack and sets up the backchain to jump over the pt_regs to the frame which has been interrupted. This is not ideal to two reasons: 1. This hides the fact that kernel stack contains interrupt frame in it and hence breaks arch_stack_walk_reliable(), which needs to know that to guarantee "reliability" and checks that there are no pt_regs on the way. 2. It breaks the backchain unwinder logic, which assumes that the next stack frame after an interrupt frame is reliable, while it is not. In some cases (when r14 contains garbage) this leads to early unwinding termination with an error, instead of marking frame as unreliable and continuing. To address that, only set backchain to 0. Fixes: 56e62a737028 ("s390: convert to generic entry") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
efa54735 |
|
03-Feb-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: split cleanup_sie The current code uses the address in %r11 to figure out whether it was called from the machine check handler or from a normal interrupt handler. Instead of doing this implicit logic (which is mostly a leftover from the old critical cleanup approach) just add a second label and use that. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
33ea0487 |
|
03-Feb-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: use r13 in cleanup_sie as temp register Instead of thrashing r11 which is normally our pointer to struct pt_regs on the stack, use r13 as temporary register in the BR_EX macro. r13 is already used in cleanup_sie, so no need to thrash another register. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
26521412 |
|
03-Feb-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: fix kernel asce loading when sie is interrupted If a machine check is coming in during sie, the PU saves the control registers to the machine check save area. Afterwards mcck_int_handler is called, which loads __LC_KERNEL_ASCE into %cr1. Later the code restores %cr1 from the machine check area, but that is wrong when SIE was interrupted because the machine check area still contains the gmap asce. Instead it should return with either __KERNEL_ASCE in %cr1 when interrupted in SIE or the previous %cr1 content saved in the machine check save area. Fixes: 87d598634521 ("s390/mm: remove set_fs / rework address space handling") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
b61b1595 |
|
03-Feb-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: add stack for machine check handler The previous code used the normal kernel stack for machine checks. This is problematic when a machine check interrupts a system call or interrupt handler right at the beginning where registers are set up. Assume system_call is interrupted at the first instruction and a machine check is triggered. The machine check handler is called, checks the PSW to see whether it is coming from user space, notices that it is already in kernel mode but %r15 still contains the user space stack. This would lead to a kernel crash. There are basically two ways of fixing that: Either using the 'critical cleanup' approach which compares the address in the PSW to see whether it is already at a point where the stack has been set up, or use an extra stack for the machine check handler. For simplicity, we will go with the second approach and allocate an extra stack. This adds some memory overhead for large systems, but usually large system have plenty of memory so this isn't really a concern. But it keeps the mchk stack setup simple and less error prone. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
b0d31159 |
|
28-Jan-2021 |
Sven Schnelle <svens@linux.ibm.com> |
s390: open code SWITCH_KERNEL macro This is a preparation patch for two later bugfixes. In the past both int_handler and machine check handler used SWITCH_KERNEL to switch to the kernel stack. However, SWITCH_KERNEL doesn't work properly in machine check context. So instead of adding more complexity to this macro, just remove it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@kernel.org> # v5.8+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
78f65709 |
|
02-Feb-2021 |
Heiko Carstens <hca@linux.ibm.com> |
s390/entry: use cpu alternative for stck/stckf Use a cpu alternative to switch between stck and stckf instead of making it compile time dependent. This will also make kernels compiled for old machines, but running on newer machines, use stckf. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
56e62a73 |
|
21-Nov-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: convert to generic entry This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
dd6cfe55 |
|
03-Dec-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390/delay: simplify udelay udelay is implemented by using quite subtle details to make it possible to load an idle psw and waiting for an interrupt even in irq context or when interrupts are disabled. Also handling (or better: no handling) of softirqs is taken into account. All this is done to optimize for something which should in normal circumstances never happen: calling udelay to busy wait. Therefore get rid of the whole complexity and just busy loop like other architectures are doing it also. It could have been possible to use diag 0x44 instead of cpu_relax() in the busy loop, however we have seen too many bad things happen with diag 0x44 that it seems to be better to simply busy loop. Also note that with this new implementation kernel preemption does work when within the udelay loop. This did not work before. To get a feeling what the former code optimizes for: IPL'ing a kernel with 'defconfig' and afterwards compiling a kernel ends with a total of zero udelay calls. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
f0c7cf13 |
|
04-Dec-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced In case of udelay CIF_IGNORE_IRQ is set. This leads to an unbalanced call of TRACE_IRQS_OFF and TRACE_IRQS_ON. That is: from lockdep's point of view TRACE_IRQS_ON is called one time too often. This doesn't fix any real bug, just makes the calls balanced. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
9365965d |
|
04-Dec-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390: always clear kernel stack backchain before calling functions Clear the kernel stack backchain before potentially calling the lockdep trace_hardirqs_off/on functions. Without this walking the kernel backchain, e.g. during a panic, might stop too early. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
454efcf8 |
|
06-Dec-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390/idle: fix accounting with machine checks When a machine check interrupt is triggered during idle, the code is using the async timer/clock for idle time calculation. It should use the machine check enter timer/clock which is passed to the macro. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> # 5.8 Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
e259b3fa |
|
03-Dec-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390/idle: add missing mt_cycles calculation During removal of the critical section cleanup the calculation of mt_cycles during idle was removed. This causes invalid accounting on systems with SMT enabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> # 5.8 Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
b1cae1f8 |
|
02-Dec-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390: fix irq state tracing With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing") common code calls arch_cpu_idle() with a lockdep state that tells irqs are on. This doesn't work very well for s390: psw_idle() will enable interrupts to wait for an interrupt. As soon as an interrupt occurs the interrupt handler will verify if the old context was psw_idle(). If that is the case the interrupt enablement bits in the old program status word will be cleared. A subsequent test in both the external as well as the io interrupt handler checks if in the old context interrupts were enabled. Due to the above patching of the old program status word it is assumed the old context had interrupts disabled, and therefore a call to TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in turn makes lockdep incorrectly "think" that interrupts are enabled within the interrupt handler. Fix this by unconditionally calling TRACE_IRQS_OFF when entering interrupt handlers. Also call unconditionally TRACE_IRQS_ON when leaving interrupts handlers. This leaves the special psw_idle() case, which now returns with interrupts disabled, but has an "irqs on" lockdep state. So callers of psw_idle() must adjust the state on their own, if required. This is currently only __udelay_disabled(). Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
062e5279 |
|
16-Nov-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390/mm: add debug user asce support Verify on exit to user space that always - the primary ASCE (cr1) is set to kernel ASCE - the secondary ASCE (cr7) is set to user ASCE If this is not the case: panic since something went terribly wrong. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
87d59863 |
|
16-Nov-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390/mm: remove set_fs / rework address space handling Remove set_fs support from s390. With doing this rework address space handling and simplify it. As a result address spaces are now setup like this: CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE ----------------------------|-----------|-----------|----------- user space | user | user | kernel kernel, normal execution | kernel | user | kernel kernel, kvm guest execution | gmap | user | kernel To achieve this the getcpu vdso syscall is removed in order to avoid secondary address mode and a separate vdso address space in for user space. The getcpu vdso syscall will be implemented differently with a subsequent patch. The kernel accesses user space always via secondary address space. This happens in different ways: - with mvcos in home space mode and directly read/write to secondary address space - with mvcs/mvcp in primary space mode and copy from primary space to secondary space or vice versa - with e.g. cs in secondary space mode and access secondary space Switching translation modes happens with sacf before and after instructions which access user space, like before. Lazy handling of control register reloading is removed in the hope to make everything simpler, but at the cost of making kernel entry and exit a bit slower. That is: on kernel entry the primary asce is always changed to contain the kernel asce, and on kernel exit the primary asce is changed again so it contains the user asce. In kernel mode there is only one exception to the primary asce: when kvm guests are executed the primary asce contains the gmap asce (which describes the guest address space). The primary asce is reset to kernel asce whenever kvm guest execution is interrupted, so that this doesn't has to be taken into account for any user space accesses. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
1179f170 |
|
20-Nov-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: fix fpu restore in entry.S We need to disable interrupts in load_fpu_regs(). Otherwise an interrupt might come in after the registers are loaded, but before CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns, CIF_FPU will be cleared and the registers will never be restored. The entry.S code usually saves the interrupt state in __SF_EMPTY on the stack when disabling/restoring interrupts. sie64a however saves the pointer to the sie control block in __SF_SIE_CONTROL, which references the same location. This is non-obvious to the reader. To avoid thrashing the sie control block pointer in load_fpu_regs(), move the __SIE_* offsets eight bytes after __SF_EMPTY on the stack. Cc: <stable@vger.kernel.org> # 5.8 Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
0cd9b723 |
|
11-Nov-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390: add separate program check exit path System call and program check handler both use the system call exit path when returning to previous context. However the program check handler jumps right to the end of the system call exit path if the previous context is kernel context. This lead to the quite odd double disabling of interrupts in the system call exit path introduced with commit ce9dfafe29be ("s390: fix system call exit path"). To avoid that have a separate program check handler exit path if the previous context is kernel context. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
75309018 |
|
09-Oct-2020 |
Jens Axboe <axboe@kernel.dk> |
s390: add support for TIF_NOTIFY_SIGNAL Wire up TIF_NOTIFY_SIGNAL handling for s390. Cc: linux-s390@vger.kernel.org Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ce9dfafe |
|
03-Nov-2020 |
Heiko Carstens <hca@linux.ibm.com> |
s390: fix system call exit path The system call exit path is running with interrupts enabled while checking for TIF/PIF/CIF bits which require special handling. If all bits have been checked interrupts are disabled and the kernel exits to user space. The problem is that after checking all bits and before interrupts are disabled bits can be set already again, due to interrupt handling. This means that the kernel can exit to user space with some TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED might be set, which might lead to additional latencies, since that bit will only be recognized with next exit to user space. Fix this by checking the corresponding bits only when interrupts are disabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Cc: <stable@vger.kernel.org> # 5.8 Acked-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
4bff8cb5 |
|
28-Apr-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: convert to GENERIC_VDSO Convert s390 to generic vDSO. There are a few special things on s390: - vDSO can be called without a stack frame - glibc did this in the past. So we need to allocate a stackframe on our own. - The former assembly code used stcke to get the TOD clock and applied time steering to it. We need to do the same in the new code. This is done in the architecture specific __arch_get_hw_counter function. The steering information is stored in an architecure specific area in the vDSO data. - CPUCLOCK_VIRT is now handled with a syscall fallback, which might be slower/less accurate than the old implementation. The getcpu() function stays as an assembly function because there is no generic implementation and the code is just a few lines. Performance number from my system do 100 mio gettimeofday() calls: Plain syscall: 8.6s Generic VDSO: 1.3s old ASM VDSO: 1s So it's a bit slower but still much faster than syscalls. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
7b7735c5 |
|
07-Jul-2020 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390: fix comment regarding interrupts in svc With the removal of the critical section cleanup, we now enter the svc interrupt handler with interrupts disabled. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
e64a1618 |
|
17-Jun-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: fix system call single stepping When single stepping an svc instruction on s390, the kernel is entered with a PER program check interruption. The program check handler than jumps to the system call handler by reloading the PSW. The code didn't set GPR13 to the thread pointer in struct task_struct. This made the kernel access invalid memory while trying to fetch the syscall function address. Fix this by always assigned GPR13 after .Lsysc_per. Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
00332c16 |
|
06-Mar-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390/ptrace: pass invalid syscall numbers to tracing tracing expects to see invalid syscalls, so pass it through. The syscall path in entry.S checks the syscall number before looking up the handler, so it is still safe. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
0b0ed657 |
|
19-Feb-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: remove critical section cleanup from entry.S The current code is rather complex and caused a lot of subtle and hard to debug bugs in the past. Simplify the code by calling the system_call handler with interrupts disabled, save machine state, and re-enable them later. This requires significant changes to the machine check handling code as well. When the machine check interrupt arrived while being in kernel mode the new code will signal pending machine checks with a SIGP external call. When userspace was interrupted, the handler will switch to the kernel stack and directly execute s390_handle_mcck(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
0b38b5e1 |
|
22-Jan-2020 |
Sven Schnelle <svens@linux.ibm.com> |
s390: prevent leaking kernel address in BEAR When userspace executes a syscall or gets interrupted, BEAR contains a kernel address when returning to userspace. This make it pretty easy to figure out where the kernel is mapped even with KASLR enabled. To fix this, add lpswe to lowcore and always execute it there, so userspace sees only the lowcore address of lpswe. For this we have to extend both critical_cleanup and the SWITCH_ASYNC macro to also check for lpswe addresses in lowcore. Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
fa686453 |
|
15-Oct-2019 |
Thomas Gleixner <tglx@linutronix.de> |
sched/rt, s390: Use CONFIG_PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the preemption and entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT output to die(). [bigeasy: +Kconfig, dumpstack.c] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: linux-s390@vger.kernel.org Link: https://lore.kernel.org/r/20191015191821.11479-18-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
67626fad |
|
03-Jun-2019 |
Heiko Carstens <hca@linux.ibm.com> |
s390: enforce CONFIG_SMP There never have been distributions that shiped with CONFIG_SMP=n for s390. In addition the kernel currently doesn't even compile with CONFIG_SMP=n for s390. Most likely it wouldn't even work, even if we fix the compile error, since nobody tests it, since there is no use case that I can think of. Therefore simply enforce CONFIG_SMP and get rid of some more or less unused code. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
26a374ae |
|
17-Jan-2019 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: add missing ENDPROC statements to assembler functions The assembler code in arch/s390 misses proper ENDPROC statements to properly end functions in a few places. Add them. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
ff4a742d |
|
03-Feb-2019 |
Gerald Schaefer <gerald.schaefer@linux.ibm.com> |
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad With a relocatable kernel that could reside at any place in memory, the storage size for the SYSCALL and PGM_CHECK handlers needs to be increased from .long to .quad. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
aa0d6e70 |
|
16-Jan-2019 |
Arnd Bergmann <arnd@arndb.de> |
s390: autogenerate compat syscall wrappers Any system call that takes a pointer argument on s390 requires a wrapper function to do a 31-to-64 zero-extension, these are currently generated in arch/s390/kernel/compat_wrapper.c. On arm64 and x86, we already generate similar wrappers for all system calls in the place of their definition, just for a different purpose (they load the arguments from pt_regs). We can do the same thing here, by adding an asm/syscall_wrapper.h file with a copy of all the relevant macros to override the generic version. Besides the addition of the compat entry point, these also rename the entry points with a __s390_ or __s390x_ prefix, similar to what we do on arm64 and x86. This in turn requires renaming a few things, and adding a proper ni_syscall() entry point. In order to still compile system call definitions that pass an loff_t argument, the __SC_COMPAT_CAST() macro checks for that and forces an -ENOSYS error, which was the best I could come up with. Those functions must obviously not get called from user space, but instead require hand-written compat_sys_*() handlers, which fortunately already exist. Link: https://lore.kernel.org/lkml/20190116131527.2071570-5-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9fed920e |
|
26-Oct-2018 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/kasan: increase instrumented stack size to 64k Increase kasan instrumented kernel stack size from 32k to 64k. Other architectures seems to get away with just doubling kernel stack size under kasan, but on s390 this appears to be not enough due to bigger frame size. The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting stack overflow is fs sync on xfs filesystem: #0 [9a0681e8] 704 bytes check_usage at 34b1fc #1 [9a0684a8] 432 bytes check_usage at 34c710 #2 [9a068658] 1048 bytes validate_chain at 35044a #3 [9a068a70] 312 bytes __lock_acquire at 3559fe #4 [9a068ba8] 440 bytes lock_acquire at 3576ee #5 [9a068d60] 104 bytes _raw_spin_lock at 21b44e0 #6 [9a068dc8] 1992 bytes enqueue_entity at 2dbf72 #7 [9a069590] 1496 bytes enqueue_task_fair at 2df5f0 #8 [9a069b68] 64 bytes ttwu_do_activate at 28f438 #9 [9a069ba8] 552 bytes try_to_wake_up at 298c4c #10 [9a069dd0] 168 bytes wake_up_worker at 23f97c #11 [9a069e78] 200 bytes insert_work at 23fc2e #12 [9a069f40] 648 bytes __queue_work at 2487c0 #13 [9a06a1c8] 200 bytes __queue_delayed_work at 24db28 #14 [9a06a290] 248 bytes mod_delayed_work_on at 24de84 #15 [9a06a388] 24 bytes kblockd_mod_delayed_work_on at 153e2a0 #16 [9a06a3a0] 288 bytes __blk_mq_delay_run_hw_queue at 158168c #17 [9a06a4c0] 192 bytes blk_mq_run_hw_queue at 1581a3c #18 [9a06a580] 184 bytes blk_mq_sched_insert_requests at 15a2192 #19 [9a06a638] 1024 bytes blk_mq_flush_plug_list at 1590f3a #20 [9a06aa38] 704 bytes blk_flush_plug_list at 1555028 #21 [9a06acf8] 320 bytes schedule at 219e476 #22 [9a06ae38] 760 bytes schedule_timeout at 21b0aac #23 [9a06b130] 408 bytes wait_for_common at 21a1706 #24 [9a06b2c8] 360 bytes xfs_buf_iowait at fa1540 #25 [9a06b430] 256 bytes __xfs_buf_submit at fadae6 #26 [9a06b530] 264 bytes xfs_buf_read_map at fae3f6 #27 [9a06b638] 656 bytes xfs_trans_read_buf_map at 10ac9a8 #28 [9a06b8c8] 304 bytes xfs_btree_kill_root at e72426 #29 [9a06b9f8] 288 bytes xfs_btree_lookup_get_block at e7bc5e #30 [9a06bb18] 624 bytes xfs_btree_lookup at e7e1a6 #31 [9a06bd88] 2664 bytes xfs_alloc_ag_vextent_near at dfa070 #32 [9a06c7f0] 144 bytes xfs_alloc_ag_vextent at dff3ca #33 [9a06c880] 1128 bytes xfs_alloc_vextent at e05fce #34 [9a06cce8] 584 bytes xfs_bmap_btalloc at e58342 #35 [9a06cf30] 1336 bytes xfs_bmapi_write at e618de #36 [9a06d468] 776 bytes xfs_iomap_write_allocate at ff678e #37 [9a06d770] 720 bytes xfs_map_blocks at f82af8 #38 [9a06da40] 928 bytes xfs_writepage_map at f83cd6 #39 [9a06dde0] 320 bytes xfs_do_writepage at f85872 #40 [9a06df20] 1320 bytes write_cache_pages at 73dfe8 #41 [9a06e448] 208 bytes xfs_vm_writepages at f7f892 #42 [9a06e518] 88 bytes do_writepages at 73fe6a #43 [9a06e570] 872 bytes __writeback_single_inode at a20cb6 #44 [9a06e8d8] 664 bytes writeback_sb_inodes at a23be2 #45 [9a06eb70] 296 bytes __writeback_inodes_wb at a242e0 #46 [9a06ec98] 928 bytes wb_writeback at a2500e #47 [9a06f038] 848 bytes wb_do_writeback at a260ae #48 [9a06f388] 536 bytes wb_workfn at a28228 #49 [9a06f5a0] 1088 bytes process_one_work at 24a234 #50 [9a06f9e0] 1120 bytes worker_thread at 24ba26 #51 [9a06fe40] 104 bytes kthread at 26545a #52 [9a06fea8] kernel_thread_starter at 21b6b62 To be able to increase the stack size to 64k reuse LLILL instruction in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE (65192) value as unsigned. Reported-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
ce3dc447 |
|
12-Sep-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: add support for virtually mapped kernel stacks With virtually mapped kernel stacks the kernel stack overflow detection is now fault based, every stack has a guard page in the vmalloc space. The panic_stack is renamed to nodat_stack and is used for all function that need to run without DAT, e.g. memcpy_real or do_start_kdump. The main effect is a reduction in the kernel image size as with vmap stacks the old style overflow checking that adds two instructions per function is not needed anymore. Result from bloat-o-meter: add/remove: 20/1 grow/shrink: 13/26854 up/down: 2198/-216240 (-214042) In regard to performance the micro-benchmark for fork has a hit of a few microseconds, allocating 4 pages in vmalloc space is more expensive compare to an order-2 page allocation. But with real workload I could not find a noticeable difference. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9d6d99e3 |
|
30-Jun-2018 |
Heiko Carstens <hca@linux.ibm.com> |
s390: wire up rseq system call Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
891f6a72 |
|
21-Jun-2018 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390: Correct register corruption in critical section cleanup In the critical section cleanup we must not mess with r1. For march=z9 or older, larl + ex (instead of exrl) are used with r1 as a temporary register. This can clobber r1 in several interrupt handlers. Fix this by using r11 as a temp register. r11 is being saved by all callers of cleanup_critical. Fixes: 6dd85fbb87 ("s390: move expoline assembler macros to a header") Cc: stable@vger.kernel.org #v4.16 Reported-by: Oliver Kurz <okurz@suse.com> Reported-by: Petr Tesařík <ptesarik@suse.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
6dd85fbb |
|
20-Apr-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: move expoline assembler macros to a header To be able to use the expoline branches in different assembler files move the associated macros from entry.S to a new header nospec-insn.h. While we are at it make the macros a bit nicer to use. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
92fa7a13 |
|
20-Mar-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kvm: improve stack frame constants in entry.S The code in sie64a uses the stack frame passed to the function to store some temporary data in the empty1 array (see struct stack_frame in asm/processor.h. Replace the __SF_EMPTY+x constants with a properly defined offset: s/__SF_EMPTY/__SF_SIE_CONTROL/, s/__SF_EMPTY+8/__SF_SIE_SAVEAREA/, s/__SF_EMPTY+16/__SF_SIE_REASON/, s/__SF_EMPTY+24/__SF_SIE_FLAGS/. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
e5b98199 |
|
26-Mar-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/lpp: use assembler alternatives for the LPP instruction With the new macros for CPU alternatives the MACHINE_FLAG_LPP check around the LPP instruction can be optimized. After this is done the flag can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b058661a |
|
26-Mar-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/entry.S: use assembler alternatives Replace the open coded alternatives for the BPOFF, BPON, BPENTER, and BPEXIT macros with the new magic from asm/alternatives-asm.h to make the code easier to read. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d3f46896 |
|
05-Mar-2018 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390/entry.S: fix spurious zeroing of r0 when a system call is interrupted we might call the critical section cleanup handler that re-does some of the operations. When we are between .Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the problem state registers r0-r7: .Lcleanup_system_call: [...] 0: # update accounting time stamp mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER # set up saved register r11 lg %r15,__LC_KERNEL_STACK la %r9,STACK_FRAME_OVERHEAD(%r15) stg %r9,24(%r11) # r11 pt_regs pointer # fill pt_regs mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC ---> stmg %r0,%r7,__PT_R0(%r9) The problem is now, that we might have already zeroed out r0. The fix is to move the zeroing of r0 after sysc_do_svc. Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com> Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d5feec04 |
|
22-Feb-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: do not bypass BPENTER for interrupt system calls The system call path can be interrupted before the switch back to the standard branch prediction with BPENTER has been done. The critical section cleanup code skips forward to .Lsysc_do_svc and bypasses the BPENTER. In this case the kernel and all subsequent code will run with the limited branch prediction. Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
dc24b7b4 |
|
19-Feb-2018 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/clean-up: use CFI_* macros in entry.S Commit f19fbd5ed642 ("s390: introduce execute-trampolines for branches") introduces .cfi_* assembler directives. Instead of using the directives directly, use the macros from asm/dwarf.h. This also ensures that the dwarf debug information are created in the .debug_frame section. Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f19fbd5e |
|
25-Jan-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: introduce execute-trampolines for branches Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and -mfunction_return= compiler options to create a kernel fortified against the specte v2 attack. With CONFIG_EXPOLINE=y all indirect branches will be issued with an execute type instruction. For z10 or newer the EXRL instruction will be used, for older machines the EX instruction. The typical indirect call basr %r14,%r1 is replaced with a PC relative call to a new thunk brasl %r14,__s390x_indirect_jump_r1 The thunk contains the EXRL/EX instruction to the indirect branch __s390x_indirect_jump_r1: exrl 0,0f j . 0: br %r1 The detour via the execute type instruction has a performance impact. To get rid of the detour the new kernel parameter "nospectre_v2" and "spectre_v2=[on,off,auto]" can be used. If the parameter is specified the kernel and module code will be patched at runtime. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
6b73044b |
|
15-Jan-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: run user space and KVM guests with modified branch prediction Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary plumbing in entry.S to be able to run user space and KVM guests with limited branch prediction. To switch a user space process to limited branch prediction the s390_isolate_bp() function has to be call, and to run a vCPU of a KVM guest associated with the current task with limited branch prediction call s390_isolate_bp_guest(). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d768bd89 |
|
15-Jan-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: add options to change branch prediction behaviour for the kernel Add the PPA instruction to the system entry and exit path to switch the kernel to a different branch prediction behaviour. The instructions are added via CPU alternatives and can be disabled with the "nospec" or the "nobp=0" kernel parameter. If the default behaviour selected with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be used to enable the changed kernel branch prediction. Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
7041d281 |
|
16-Jan-2018 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: scrub registers on kernel entry and KVM exit Clear all user space registers on entry to the kernel and all KVM guest registers on KVM guest exit if the register does not contain either a parameter or a result value. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
4381f9f1 |
|
11-Dec-2017 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/syscalls: use generated syscall_table.h and unistd.h header files Update the uapi/asm/unistd.h to include the generated compat and 64-bit version of the unistd.h and, as well as, the unistd_nr.h header file. Also remove the arch/s390/kernel/syscalls.S file and use the generated system call table, syscall_table.h, instead. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
3241d3eb |
|
16-Nov-2017 |
Heiko Carstens <hca@linux.ibm.com> |
s390: rework __switch_to() to allow larger task_struct offsets If GCC_PLUGIN_RANDSTRUCT is enabled the members of task_struct will be shuffled around. The offsets of the "pid" and "stack" members within task_struct may not necessarily fit into 12 bits anymore, which causes compile errors within __switch_to, since instructions are used, which only have a 12 bit displacement field. Therefore rework __switch_to, to allow for larger offsets. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
0aaba41b |
|
21-Aug-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: remove all code using the access register mode The vdso code for the getcpu() and the clock_gettime() call use the access register mode to access the per-CPU vdso data page with the current code. An alternative to the complicated AR mode is to use the secondary space mode. This makes the vdso faster and quite a bit simpler. The downside is that the uaccess code has to be changed quite a bit. Which instructions are used depends on the machine and what kind of uaccess operation is requested. The instruction dictates which ASCE value needs to be loaded into %cr1 and %cr7. The different cases: * User copy with MVCOS for z10 and newer machines The MVCOS instruction can copy between the primary space (aka user) and the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already loaded in %cr1. * User copy with MVCP/MVCS for older machines To be able to execute the MVCP/MVCS instructions the kernel needs to switch to primary mode. The control register %cr1 has to be set to the kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent on set_fs(KERNEL_DS) vs set_fs(USER_DS). * Data access in the user address space for strnlen / futex To use "normal" instruction with data from the user address space the secondary space mode is used. The kernel needs to switch to primary mode, %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the kernel ASCE, dependent on set_fs. To load a new value into %cr1 or %cr7 is an expensive operation, the kernel tries to be lazy about it. E.g. for multiple user copies in a row with MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is done only once. On return to user space a CPU bit is checked that loads the vdso ASCE again. To enable and disable the data access via the secondary space two new functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact that a context is in secondary space uaccess mode is stored in the mm_segment_t value for the task. The code of an interrupt may use set_fs as long as it returns to the previous state it got with get_fs with another call to set_fs. The code in finish_arch_post_lock_switch simply has to do a set_fs with the current mm_segment_t value for the task. For CPUs with MVCOS: CPU running in | %cr1 ASCE | %cr7 ASCE | --------------------------------------|-----------|-----------| user space | user | vdso | kernel, USER_DS, normal-mode | user | vdso | kernel, USER_DS, normal-mode, lazy | user | user | kernel, USER_DS, sacf-mode | kernel | user | kernel, KERNEL_DS, normal-mode | kernel | vdso | kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel | kernel, KERNEL_DS, sacf-mode | kernel | kernel | For CPUs without MVCOS: CPU running in | %cr1 ASCE | %cr7 ASCE | --------------------------------------|-----------|-----------| user space | user | vdso | kernel, USER_DS, normal-mode | user | vdso | kernel, USER_DS, normal-mode lazy | kernel | user | kernel, USER_DS, sacf-mode | kernel | user | kernel, KERNEL_DS, normal-mode | kernel | vdso | kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel | kernel, KERNEL_DS, sacf-mode | kernel | kernel | The lines with "lazy" refer to the state after a copy via the secondary space with a delayed reload of %cr1 and %cr7. There are three hardware address spaces that can cause a DAT exception, primary, secondary and home space. The exception can be related to four different fault types: user space fault, vdso fault, kernel fault, and the gmap faults. Dependent on the set_fs state and normal vs. sacf mode there are a number of fault combinations: 1) user address space fault via the primary ASCE 2) gmap address space fault via the primary ASCE 3) kernel address space fault via the primary ASCE for machines with MVCOS and set_fs(KERNEL_DS) 4) vdso address space faults via the secondary ASCE with an invalid address while running in secondary space in problem state 5) user address space fault via the secondary ASCE for user-copy based on the secondary space mode, e.g. futex_ops or strnlen_user 6) kernel address space fault via the secondary ASCE for user-copy with secondary space mode with set_fs(KERNEL_DS) 7) kernel address space fault via the primary ASCE for user-copy with secondary space mode with set_fs(USER_DS) on machines without MVCOS. 8) kernel address space fault via the home space ASCE Replace user_space_fault() with a new function get_fault_type() that can distinguish all four different fault types. With these changes the futex atomic ops from the kernel and the strnlen_user will get a little bit slower, as well as the old style uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as fast as before. On the positive side, the user space vdso code is a lot faster and Linux ceases to use the complicated AR mode. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
c771320e |
|
05-Oct-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/mm,kvm: improve detection of KVM guest faults The identification of guest fault currently relies on the PF_VCPU flag. This is set in guest_entry_irqoff and cleared in guest_exit_irqoff. Both functions are called by __vcpu_run, the PF_VCPU flag is set for quite a lot of kernel code outside of the guest execution. Replace the PF_VCPU scheme with the PIF_GUEST_FAULT in the pt_regs and make the program check handler code in entry.S set the bit only for exception that occurred between the .Lsie_gmap and .Lsie_done labels. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
2a2d7bef |
|
26-Oct-2017 |
Vasily Gorbik <gor@linux.vnet.ibm.com> |
s390/nmi: avoid using long-displacement facility __LC_MCESAD is currently 4528 /* offsetof(struct lowcore, mcesad) */ that would require long-displacement facility for lg, which we don't have on z900. Fixes: 3037a52f9846 ("s390/nmi: do register validation as early as possible") Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
0a5e2ec2 |
|
05-Oct-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kvm: fix detection of guest machine checks The new detection code for guest machine checks added a check based on %r11 to .Lcleanup_sie to distinguish between normal asynchronous interrupts and machine checks. But the funtion is called from the program check handler as well with an undefined value in %r11. The effect is that all program exceptions pointing to the SIE instruction will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the next machine check comes in which will incorrectly be interpreted as a guest machine check. The simplest fix is to stop using .Lcleanup_sie in the program check handler and duplicate a few instructions. Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Cc: <stable@vger.kernel.org> # v4.13+ Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
3037a52f |
|
12-Oct-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/nmi: do register validation as early as possible The validation of the CPU registers in the machine check handler is currently split into two parts. The first part is done at the start of the low level mcck_int_handler function, this includes the CPU timer register and the general purpose registers. The second part is done a bit later in s390_do_machine_check for all the other registers, including the control registers, floating pointer control, vector or floating pointer registers, the access registers, the guarded storage registers, the TOD programmable registers and the clock comparator. This is working fine to far but in theory a future extensions could cause the C code to use registers that are not validated yet. A better approach is to validate all CPU registers in "safe" assembler code before any C function is called. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c929500d |
|
07-Jun-2017 |
QingFeng Hao <haoqf@linux.vnet.ibm.com> |
s390/nmi: s390: New low level handling for machine check happening in guest Add the logic to check if the machine check happens when the guest is running. If yes, set the exit reason -EINTR in the machine check's interrupt handler. Refactor s390_do_machine_check to avoid panicing the host for some kinds of machine checks which happen when guest is running. Reinject the instruction processing damage's machine checks including Delayed Access Exception instead of damaging the host if it happens in the guest because it could be caused by improper update on TLB entry or other software case and impacts the guest only. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
f044f4c58 |
|
12-Jun-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/fpu: export save_fpu_regs for all configs The save_fpu_regs function is a general API that is supposed to be usable for modules as well. Remove the #ifdef that hides the symbol for CONFIG_KVM=n. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
23fefe11 |
|
07-Jun-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kvm: avoid global config of vm.alloc_pgste=1 The system control vm.alloc_pgste is used to control the size of the page tables, either 2K or 4K. The idea is that a KVM host sets the vm.alloc_pgste control to 1 which causes *all* new processes to run with 4K page tables. For a non-kvm system the control should stay off to save on memory used for page tables. Trouble is that distributions choose to set the control globally to be able to run KVM guests. This wastes memory on non-KVM systems. Introduce the PT_S390_PGSTE ELF segment type to "mark" the qemu executable with it. All executables with this (empty) segment in its ELF phdr array will be started with 4K page tables. Any executable without PT_S390_PGSTE will run with the default 2K page tables. This removes the need to set vm.alloc_pgste=1 for a KVM host and minimizes the waste of memory for page tables. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c0e7bb38 |
|
15-May-2017 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390/kvm: do not rely on the ILC on kvm host protection fauls For most cases a protection exception in the host (e.g. copy on write or dirty tracking) on the sie instruction will indicate an instruction length of 4. Turns out that there are some corner cases (e.g. runtime instrumentation) where this is not necessarily true and the ILC is unpredictable. Let's replace our 4 byte rewind_pad with 3 byte nops to prepare for all possible ILCs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
07a63cbe |
|
02-May-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/cputime: fix incorrect system time git commit c5328901aa1db134 "[S390] entry[64].S improvements" removed the update of the exit_timer lowcore field from the critical section cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done blocks. If the PSW is updated by the critical section cleanup to point to user space again, the interrupt entry code will do a vtime calculation after the cleanup completed with an exit_timer value which has *not* been updated. Due to this incorrect system time deltas are calculated. If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry time of the interrupt. Cc: stable@vger.kernel.org # 3.3+ Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
df26c2e8 |
|
03-Apr-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/cpumf: simplify detection of guest samples There are three different code levels in regard to the identification of guest samples. They differ in the way the LPP instruction is used. 1) Old kernels without the LPP instruction. The guest program parameter is always zero. 2) Newer kernels load the process pid into the program parameter with LPP. The guest program parameter is non-zero if the guest executes in a process != idle. 3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value non-zero even for the idle task. The guest program parameter is non-zero if the guest is running. All kernels load the process pid to CR4 on context switch. The CPU sampling code uses the value in CR4 to decide between guest and host samples in case the guest program parameter is zero. The three cases: 1) CR4==pid, gpp==0 2) CR4==pid, gpp==pid 3) CR4==pid, gpp==((1UL << 31) | pid) The load-control instruction to load the pid into CR4 is expensive and the goal is to remove it. To distinguish the host CR4 from the guest pid for the idle process the maximum value 0xffff for the PASN is used. This adds a fourth case for a guest OS with an updated kernel: 4) CR4==0xffff, gpp=((1UL << 31) | pid) The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff) to identify guest samples. This works nicely with all 4 cases, the only possible issue would be a guest with an old kernel (gpp==0) and a process pid of 0xffff. Well, don't do that.. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
cab36c26 |
|
03-Apr-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: use 64-bit lctlg to load task pid to cr4 on context switch The 32-bit lctl instruction is quite a bit slower than the 64-bit counter part lctlg. Use the faster instruction. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
916cda1a |
|
26-Jan-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: add a system call for guarded storage This adds a new system call to enable the use of guarded storage for user space processes. The system call takes two arguments, a command and pointer to a guarded storage control block: s390_guarded_storage(int command, struct gs_cb *gs_cb); The second argument is relevant only for the GS_SET_BC_CB command. The commands in detail: 0 - GS_ENABLE Enable the guarded storage facility for the current task. The initial content of the guarded storage control block will be all zeros. After the enablement the user space code can use load-guarded-storage-controls instruction (LGSC) to load an arbitrary control block. While a task is enabled the kernel will save and restore the current content of the guarded storage registers on context switch. 1 - GS_DISABLE Disables the use of the guarded storage facility for the current task. The kernel will cease to save and restore the content of the guarded storage registers, the task specific content of these registers is lost. 2 - GS_SET_BC_CB Set a broadcast guarded storage control block. This is called per thread and stores a specific guarded storage control block in the task struct of the current task. This control block will be used for the broadcast event GS_BROADCAST. 3 - GS_CLEAR_BC_CB Clears the broadcast guarded storage control block. The guarded- storage control block is removed from the task struct that was established by GS_SET_BC_CB. 4 - GS_BROADCAST Sends a broadcast to all thread siblings of the current task. Every sibling that has established a broadcast guarded storage control block will load this control block and will be enabled for guarded storage. The broadcast guarded storage control block is used up, a second broadcast without a refresh of the stored control block with GS_SET_BC_CB will not have any effect. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
2f09ca60 |
|
13-Feb-2017 |
Miroslav Benes <mbenes@suse.cz> |
livepatch/s390: add TIF_PATCH_PENDING thread flag Update a task's patch state when returning from a system call or user space interrupt, or after handling a signal. This greatly increases the chances of a patch operation succeeding. If a task is I/O bound, it can be patched when returning from a system call. If a task is CPU bound, it can be patched when returning from an interrupt. If a task is sleeping on a to-be-patched function, the user can send SIGSTOP and SIGCONT to force it to switch. Since there are two ways the syscall can be restarted on return from a signal handling process, it is important to clear the flag before do_signal() is called. Otherwise we could miss the migration if we used SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking tasks. If we place our hook to sysc_work label in entry before TIF_SIGPENDING is evaluated we kill two birds with one stone. The task is correctly migrated in all return paths from a syscall. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
d9fcf2a1 |
|
27-Feb-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix in-kernel program checks A program check inside the kernel takes a slightly different path in entry.S compare to a normal user fault. A recent change moved the store of the breaking event address into the path taken for in-kernel program checks as well, but %r14 has not been setup to point to the correct location. A wild store is the consequence. Move the store of the breaking event address to the code path for user space faults. Fixes: 34525e1f7e8d ("s390: store breaking event address only for program checks") Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b5a882fc |
|
17-Feb-2017 |
Heiko Carstens <hca@linux.ibm.com> |
s390: restore address space when returning to user space Unbalanced set_fs usages (e.g. early exit from a function and a forgotten set_fs(USER_DS) call) may lead to a situation where the secondary asce is the kernel space asce when returning to user space. This would allow user space to modify kernel space at will. This would only be possible with the above mentioned kernel bug, however we can detect this and fix the secondary asce before returning to user space. Therefore a new TIF_ASCE_SECONDARY which is used within set_fs. When returning to user space check if TIF_ASCE_SECONDARY is set, which would indicate a bug. If it is set print a message to the console, fixup the secondary asce, and then return to user space. This is similar to what is being discussed for x86 and arm: "[RFC] syscalls: Restore address limit after a syscall". Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
606aa4aa |
|
17-Feb-2017 |
Heiko Carstens <hca@linux.ibm.com> |
s390: rename CIF_ASCE to CIF_ASCE_PRIMARY This is just a preparation patch in order to keep the "restore address space after syscall" patch small. Rename CIF_ASCE to CIF_ASCE_PRIMARY to be unique and specific when introducing a second CIF_ASCE_SECONDARY CIF flag. Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d24b98e3 |
|
20-Feb-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/syscall: fix single stepped system calls Fix PER tracing of system calls after git commit 34525e1f7e8dc478 "s390: store breaking event address only for program checks" broke it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
57d7f939 |
|
22-Mar-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: add no-execute support Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
34525e1f |
|
24-Jan-2017 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: store breaking event address only for program checks The principles of operations specifies that the breaking event address is stored to the address 0x110 in the prefix page only for program checks. The last branch in user space is lost as soon as a branch in kernel space is executed after e.g. an svc. This makes it impossible to accurately maintain the breaking event address for a user space process. Simplify the code, just copy the current breaking event address from 0x110 to the task structure for program checks from user space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
7df11604 |
|
07-Dec-2016 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove unused labels from entry.S Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
ce4dda3f |
|
02-Dec-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix machine check panic stack switch For system damage machine checks or machine checks due to invalid PSW fields the system will be stopped. In order to get an oops message out before killing the system the machine check handler branches to .Lmcck_panic, switches to the panic stack and then does the usual machine check handling. The switch to the panic stack is incomplete, the stack pointer in %r15 is replaced, but the pt_regs pointer in %r11 is not. The result is a program check which will kill the system in a slightly different way. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
61aaef51 |
|
25-Nov-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix kernel oops for CONFIG_MARCH_Z900=y builds The LAST_BREAK macro in entry.S uses a different instruction sequence for CONFIG_MARCH_Z900 builds. The branch target offset to skip the store of the last breaking event address needs to take the different length of the code block into account. Fixes: f8fc82b47149e344 ("s390: move sys_call_table and last_break from thread_info to thread_struct") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
3a890380 |
|
14-Nov-2016 |
Heiko Carstens <hca@linux.ibm.com> |
s390/thread_info: get rid of THREAD_ORDER define We have the s390 specific THREAD_ORDER define and the THREAD_SIZE_ORDER define which is also used in common code. Both have exactly the same semantics. Therefore get rid of THREAD_ORDER and always use THREAD_SIZE_ORDER instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
ef280c85 |
|
07-Nov-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: move sys_call_table and last_break from thread_info to thread_struct Move the last two architecture specific fields from the thread_info structure to the thread_struct. All that is left in thread_info is the flags field. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d5c352cd |
|
08-Nov-2016 |
Heiko Carstens <hca@linux.ibm.com> |
s390: move thread_info into task_struct This is the s390 variant of commit 15f4eae70d36 ("x86: Move thread_info into task_struct"). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c360192b |
|
24-Oct-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/preempt: move preempt_count to the lowcore Convert s390 to use a field in the struct lowcore for the CPU preemption count. It is a bit cheaper to access a lowcore field compared to a thread_info variable and it removes the depencency on a task related structure. bloat-o-meter on the vmlinux image for the default configuration (CONFIG_PREEMPT_NONE=y) reports a small reduction in text size: add/remove: 0/0 grow/shrink: 18/578 up/down: 228/-5448 (-5220) A larger improvement is achieved with the default configuration but with CONFIG_PREEMPT=y and CONFIG_DEBUG_PREEMPT=n: add/remove: 2/6 grow/shrink: 59/4477 up/down: 1618/-228762 (-227144) Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
711f5df7 |
|
12-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
s390: move exports to definitions Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
46210c44 |
|
29-Jun-2016 |
Heiko Carstens <hca@linux.ibm.com> |
s390: have unique symbol for __switch_to address After linking there are several symbols for the same address that the __switch_to symbol points to. E.g.: 000000000089b9c0 T __kprobes_text_start 000000000089b9c0 T __lock_text_end 000000000089b9c0 T __lock_text_start 000000000089b9c0 T __sched_text_end 000000000089b9c0 T __switch_to When disassembling with "objdump -d" this results in a missing __switch_to function. It would be named __kprobes_text_start instead. To unconfuse objdump add a nop in front of the kprobes text section. That way __switch_to appears again. Obviously this solution is sort of a hack, since it also depends on link order if this works or not. However it is the best I can come up with for now. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
43799597 |
|
24-Jun-2016 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove pointless load within __switch_to Remove a leftover from the code that transferred a couple of TIF bits from the previous task to the next task. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
e370e476 |
|
10-Mar-2016 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix floating pointer register corruption (again) There is a tricky interaction between the machine check handler and the critical sections of load_fpu_regs and save_fpu_regs functions. If the machine check interrupts one of the two functions the critical section cleanup will complete the function before the machine check handler s390_do_machine_check is called. Trouble is that the machine check handler needs to validate the floating point registers *before* and not *after* the completion of load_fpu_regs/save_fpu_regs. The simplest solution is to rewind the PSW to the start of the load_fpu_regs/save_fpu_regs and retry the function after the return from the machine check handler. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: <stable@vger.kernel.org> # 4.3+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b1685ab9 |
|
29-Feb-2016 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390/cpumf: Improve guest detection heuristics commit e22cf8ca6f75 ("s390/cpumf: rework program parameter setting to detect guest samples") requires guest changes to get proper guest/host. We can do better: We can use the primary asn value, which is set on all Linux variants to compare this with the host pp value. We now have the following cases: 1. Guest using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp != 0 --> guest 2. Guest not using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp == 0, asn != hpp --> guest As soon as the host no longer sets CR4, we must back out this heuristics - let's add a comment in switch_to. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
419123f9 |
|
19-Nov-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/spinlock: do not yield to a CPU in udelay/mdelay It does not make sense to try to relinquish the time slice with diag 0x9c to a CPU in a state that does not allow to schedule the CPU. The scenario where this can happen is a CPU waiting in udelay/mdelay while holding a spin-lock. Add a CIF bit to tag a CPU in enabled wait and use it to detect that the yield of a CPU will not be successful and skip the diagnose call. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
db7e007f |
|
15-Aug-2015 |
Heiko Carstens <hca@linux.ibm.com> |
s390/udelay: make udelay have busy loop semantics When using systemtap it was observed that our udelay implementation is rather suboptimal if being called from a kprobe handler installed by systemtap. The problem observed when a kprobe was installed on lock_acquired(). When the probe was hit the kprobe handler did call udelay, which set up an (internal) timer and reenabled interrupts (only the clock comparator interrupt) and waited for the interrupt. This is an optimization to avoid that the cpu is busy looping while waiting that enough time passes. The problem is that the interrupt handler still does call irq_enter()/irq_exit() which then again can lead to a deadlock, since some accounting functions may take locks as well. If one of these locks is the same, which caused lock_acquired() to be called, we have a nice deadlock. This patch reworks the udelay code for the interrupts disabled case to immediately leave the low level interrupt handler when the clock comparator interrupt happens. That way no C code is being called and the deadlock cannot happen anymore. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
e22cf8ca |
|
06-Oct-2015 |
Christian Borntraeger <borntraeger@de.ibm.com> |
s390/cpumf: rework program parameter setting to detect guest samples The program parameter can be used to mark hardware samples with some token. Previously, it was used to mark guest samples only. Improve the program parameter doubleword by combining two parts, the leftmost LPP part and the rightmost PID part. Set the PID part for processes by using the task PID. To distinguish host and guest samples for the kernel (PID part is zero), the guest must always set the program paramater to a non-zero value. Use the leftmost bit in the LPP part of the program parameter to be able to detect guest kernel samples. [brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced __LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts. And updated the commit message accordingly. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
83abeffb |
|
01-Oct-2015 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/entry: add assembler macro to conveniently tests under mask Various functions in entry.S perform test-under-mask instructions to test for particular bits in memory. Because test-under-mask uses a mask value of one byte, the mask value and the offset into the memory must be calculated manually. This easily introduces errors and is hard to review and read. Introduce the TSTMSK assembler macro to specify a mask constant and let the macro calculate the offset and the byte mask to generate a test-under-mask instruction. The benefit is that existing symbolic constants can now be used for tests. Also the macro checks for zero mask values and mask values that consist of multiple bytes. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
0ac27779 |
|
29-Sep-2015 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/fpu: add static FPU save area for init_task Previously, the init task did not have an allocated FPU save area and saving an FPU state was not possible. Now if the vector extension is always enabled, provide a static FPU save area to save FPU states of vector instructions that can be executed quite early. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b5510d9b |
|
29-Sep-2015 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/fpu: always enable the vector facility if it is available If the kernel detects that the s390 hardware supports the vector facility, it is enabled by default at an early stage. To force it off, use the novx kernel parameter. Note that there is a small time window, where the vector facility is enabled before it is forced to be off. With enabling the vector facility by default, the FPU save and restore functions can be improved. They do not longer require to manage expensive control register updates to enable or disable the vector enablement control for particular processes. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
72d38b19 |
|
18-Sep-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/vtime: correct scaled cputime of partially idle CPUs The calculation for the SMT scaling factor for a hardware thread which has been partially idle needs to disregard the cycles spent by the other threads of the core while the thread is idle. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9380cf5a |
|
09-Sep-2015 |
Heiko Carstens <hca@linux.ibm.com> |
s390: fix floating point register corruption The critical section cleanup code misses to add the offset of the thread_struct to the task address. Therefore, if the critical section code gets executed, it may corrupt the task struct or restore the contents of the floating point registers from the wrong memory location. Fixes d0164ee20d "s390/kernel: remove save_fpu_regs() parameter and use __LC_CURRENT instead". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
888d5e98 |
|
09-Jul-2015 |
Christian Borntraeger <borntraeger@de.ibm.com> |
KVM: s390: use pid of cpu thread for sampling tagging Right now we use the address of the sie control block as tag for the sampling data. This is hard to get for users. Let's just use the PID of the cpu thread to mark the hardware samples. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d0164ee2 |
|
29-Jun-2015 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/kernel: remove save_fpu_regs() parameter and use __LC_CURRENT instead All calls to save_fpu_regs() specify the fpu structure of the current task pointer as parameter. The task pointer of the current task can also be retrieved from the CPU lowcore directly. Remove the parameter definition, load the __LC_CURRENT task pointer from the CPU lowcore, and rebase the FPU structure onto the task structure. Apply the same approach for the load_fpu_regs() function. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
2acb94f4 |
|
22-Jun-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/nmi: use the normal asynchronous stack for machine checks If a machine checks is received while the CPU is in the kernel, only the s390_do_machine_check function will be called. The call to s390_handle_mcck is postponed until the CPU returns to user space. Because of this it is safe to use the asynchronous stack for machine checks even if the CPU is already handling an interrupt. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a359bb11 |
|
22-Jun-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kernel: squeeze a few more cycles out of the system call handler Reorder the instructions of UPDATE_VTIME to improve superscalar execution, remove duplicate checks for problem-state from the asynchronous interrupt handlers, and move the check for problem-state from the synchronous exit path to the program check path as it is only needed for program checks inside the kernel. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d0fc4107 |
|
22-Jun-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kvm: integrate HANDLE_SIE_INTERCEPT into cleanup_critical Currently there are two mechanisms to deal with cleanup work due to interrupts. The HANDLE_SIE_INTERCEPT macro is used to undo the changes required to enter SIE in sie64a. If the SIE instruction causes a program check, or an asynchronous interrupt is received the HANDLE_SIE_INTERCEPT code forwards the program execution to sie_exit. All the other critical sections in entry.S are handled by the code in cleanup_critical that is called by the SWITCH_ASYNC macro. Move the sie64a function to the beginning of the critical section and add the code from HANDLE_SIE_INTERCEPT to cleanup_critical. Add a special case for the sie64a cleanup to the program check handler. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
dcd2a9aa |
|
22-Jun-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/kvm: fix interrupt race with HANDLE_SIE_INTERCEPT The HANDLE_SIE_INTERCEPT macro is used in the interrupt handlers and the program check handler to undo a few changes done by sie64a. Among them are guest vs host LPP, the gmap ASCE vs kernel ASCE and the bit that indicates that SIE is currently running on the CPU. There is a race of a voluntary SIE exit vs asynchronous interrupts. If the CPU completed the SIE instruction and the TM instruction of the LPP macro at the time it receives an interrupt, the interrupt handler will run while the LPP, the ASCE and the SIE bit are still set up for guest execution. This might result in wrong sampling data, but it will not cause data corruption or lockups. The critical section in sie64a needs to be enlarged to include all instructions that undo the changes required for guest execution. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9977e886 |
|
09-Jun-2015 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
s390/kernel: lazy restore fpu registers Improve the save and restore behavior of FPU register contents to use the vector extension within the kernel. The kernel does not use floating-point or vector registers and, therefore, saving and restoring the FPU register contents are performed for handling signals or switching processes only. To prepare for using vector instructions and vector registers within the kernel, enhance the save behavior and implement a lazy restore at return to user space from a system call or interrupt. To implement the lazy restore, the save_fpu_regs() sets a CPU information flag, CIF_FPU, to indicate that the FPU registers must be restored. Saving and setting CIF_FPU is performed in an atomic fashion to be interrupt-safe. When the kernel wants to use the vector extension or wants to change the FPU register state for a task during signal handling, the save_fpu_regs() must be called first. The CIF_FPU flag is also set at process switch. At return to user space, the FPU state is restored. In particular, the FPU state includes the floating-point or vector register contents, as well as, vector-enablement and floating-point control. The FPU state restore and clearing CIF_FPU is also performed in an atomic fashion. For KVM, the restore of the FPU register state is performed when restoring the general-purpose guest registers before the SIE instructions is started. Because the path towards the SIE instruction is interruptible, the CIF_FPU flag must be checked again right before going into SIE. If set, the guest registers must be reloaded again by re-entering the outer SIE loop. This is the same behavior as if the SIE critical section is interrupted. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
3827ec3d |
|
20-Jul-2015 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: adapt entry.S to the move of thread_struct git commit 0c8c0f03e3a292e031596484275c14cf39c0ab7a "x86/fpu, sched: Dynamically allocate 'struct fpu'" moved the thread_struct to the end of the task_struct. This causes some of the offsets used in entry.S to overflow their instruction operand field. To fix this use aghi to create a dedicated pointer for the thread_struct. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
8e236546 |
|
09-Apr-2015 |
Christian Borntraeger <borntraeger@de.ibm.com> |
KVM: s390: make exit_sie_sync more robust exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at any point in time. This is used to reload the prefix pages and to set the IBS stuff in a way that guarantees that after this function returns we are no longer in SIE. All current users trigger KVM requests. The request must be set before we block the CPUs to avoid races. Let's make this implicit by adding the request into a new function kvm_s390_sync_requests that replaces exit_sie_sync and split out s390_vcpu_block and s390_vcpu_unblock, that can be used to keep CPUs out of SIE independent of requests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
|
#
a876cb3f |
|
13-Feb-2015 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove 31 bit syscalls Remove the 31 bit syscalls from the syscall table. This is a separate patch just in case I screwed something up so it can be easily reverted. However the conversion was done with a script, so everything should be ok. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
4bfc86ce |
|
13-Feb-2015 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove "64" suffix from a couple of files Rename a couple of files to get rid of the "64" suffix. "git blame" will still work. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
5a79859a |
|
12-Feb-2015 |
Heiko Carstens <hca@linux.ibm.com> |
s390: remove 31 bit support Remove the 31 bit support in order to reduce maintenance cost and effectively remove dead code. Since a couple of years there is no distribution left that comes with a 31 bit kernel. The 31 bit kernel also has been broken since more than a year before anybody noticed. In addition I added a removal warning to the kernel shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning message") which let everybody know about the plan to remove 31 bit code. We didn't get any response. Given that the last 31 bit only machine was introduced in 1999 let's remove the code. Anybody with 31 bit user space code can still use the compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
86ed42f4 |
|
03-Dec-2014 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: use local symbol names in entry[64].S To improve the output of the perf tool hide most of the symbols from entry[64].S by using the '.L' prefix. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
d3a73acb |
|
14-Apr-2014 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: split TIF bits into CIF, PIF and TIF bits The oi and ni instructions used in entry[64].S to set and clear bits in the thread-flags are not guaranteed to be atomic in regard to other CPUs. Split the TIF bits into CPU, pt_regs and thread-info specific bits. Updates on the TIF bits are done with atomic instructions, updates on CPU and pt_regs bits are done with non-atomic instructions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
beef560b |
|
14-Apr-2014 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/uaccess: simplify control register updates Always switch to the kernel ASCE in switch_mm. Load the secondary space ASCE in finish_arch_post_lock_switch after checking that any pending page table operations have completed. The primary ASCE is loaded in entry[64].S. With this the update_primary_asce call can be removed from the switch_to macro and from the start of switch_mm function. Remove the load_primary argument from update_user_asce/clear_user_asce, rename update_user_asce to set_user_asce and rename update_primary_asce to load_kernel_asce. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
21ee7ffd |
|
26-Feb-2014 |
Jens Freimann <jfrei@linux.vnet.ibm.com> |
s390: rename and split lowcore field per_perc_atmid per_perc_atmid is currently a two-byte field that combines two fields, the PER code and the PER Addressing-and-Translation-Mode Identification (ATMID) Let's make them accessible indepently and also rename per_cause to per_code. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
457f2180 |
|
21-Mar-2014 |
Heiko Carstens <hca@linux.ibm.com> |
s390/uaccess: rework uaccess code - fix locking issues The current uaccess code uses a page table walk in some circumstances, e.g. in case of the in atomic futex operations or if running on old hardware which doesn't support the mvcos instruction. However it turned out that the page table walk code does not correctly lock page tables when accessing page table entries. In other words: a different cpu may invalidate a page table entry while the current cpu inspects the pte. This may lead to random data corruption. Adding correct locking however isn't trivial for all uaccess operations. Especially copy_in_user() is problematic since that requires to hold at least two locks, but must be protected against ABBA deadlock when a different cpu also performs a copy_in_user() operation. So the solution is a different approach where we change address spaces: User space runs in primary address mode, or access register mode within vdso code, like it currently already does. The kernel usually also runs in home space mode, however when accessing user space the kernel switches to primary or secondary address mode if the mvcos instruction is not available or if a compare-and-swap (futex) instruction on a user space address is performed. KVM however is special, since that requires the kernel to run in home address space while implicitly accessing user space with the sie instruction. So we end up with: User space: - runs in primary or access register mode - cr1 contains the user asce - cr7 contains the user asce - cr13 contains the kernel asce Kernel space: - runs in home space mode - cr1 contains the user or kernel asce -> the kernel asce is loaded when a uaccess requires primary or secondary address mode - cr7 contains the user or kernel asce, (changed with set_fs()) - cr13 contains the kernel asce In case of uaccess the kernel changes to: - primary space mode in case of a uaccess (copy_to_user) and uses e.g. the mvcp instruction to access user space. However the kernel will stay in home space mode if the mvcos instruction is available - secondary space mode in case of futex atomic operations, so that the instructions come from primary address space and data from secondary space In case of kvm the kernel runs in home space mode, but cr1 gets switched to contain the gmap asce before the sie instruction gets executed. When the sie instruction is finished cr1 will be switched back to contain the user asce. A context switch between two processes will always load the kernel asce for the next process in cr1. So the first exit to user space is a bit more expensive (one extra load control register instruction) than before, however keeps the code rather simple. In sum this means there is no need to perform any error prone page table walks anymore when accessing user space. The patch seems to be rather large, however it mainly removes the the page table walk code and restores the previously deleted "standard" uaccess code, with a couple of changes. The uaccess without mvcos mode can be enforced with the "uaccess_primary" kernel parameter. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
53e857f3 |
|
10-Sep-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries Git commit 050eef364ad70059 "[S390] fix tlb flushing vs. concurrent /proc accesses" introduced the attach counter to avoid using the mm_users value to decide between IPTE for every PTE and lazy TLB flushing with IDTE. That fixed the problem with mm_users but it introduced another subtle race, fortunately one that is very hard to hit. The background is the requirement of the architecture that a valid PTE may not be changed while it can be used concurrently by another cpu. The decision between IPTE and lazy TLB flushing needs to be done while the PTE is still valid. Now if the virtual cpu is temporarily stopped after the decision to use lazy TLB flushing but before the invalid bit of the PTE has been set, another cpu can attach the mm, find that flush_mm is set, do the IDTE, return to userspace, and recreate a TLB that uses the PTE in question. When the first, stopped cpu continues it will change the PTE while it is attached on another cpu. The first cpu will do another IDTE shortly after the modification of the PTE which makes the race window quite short. To fix this race the CPU that wants to attach the address space of a user space thread needs to wait for the end of the PTE modification. The number of concurrent TLB flushers for an mm is tracked in the upper 16 bits of the attach_count and finish_arch_post_lock_switch is used to wait for the end of the flush operation if required. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
dbbfe487 |
|
27-Sep-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix system call restart after inferior call Git commit 616498813b11ffef "s390: system call path micro optimization" introduced a regression in regard to system call restarting and inferior function calls via the ptrace interface. The pointer to the system call table needs to be loaded in sysc_sigpending if do_signal returns with TIF_SYSCALl set after it restored a system call context. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
0587d409 |
|
23-Aug-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/time: return with irqs disabled from psw_idle Modify the psw_idle waiting logic in entry[64].S to return with interrupts disabled. This avoids potential issues with udelay and interrupt loops as interrupts are not reenabled after clock comparator interrupts. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
1f44a225 |
|
27-Jun-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: convert interrupt handling to use generic hardirq With the introduction of PCI it became apparent that s390 should convert to generic hardirqs as too many drivers do not have the correct dependency for GENERIC_HARDIRQS. On the architecture level s390 does not have irq lines. It has external interrupts, I/O interrupts and adapter interrupts. This patch hard-codes all external interrupts as irq #1, all I/O interrupts as irq #2 and all adapter interrupts as irq #3. The additional information from the lowcore associated with the interrupt is stored in the pt_regs of the interrupt frame, where the interrupt handler can pick it up. For PCI/MSI interrupts the adapter interrupt handler scans the relevant bit fields and calls generic_handle_irq with the virtual irq number for the MSI interrupt. Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
48f6b00c |
|
17-Jun-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/irq: store interrupt information in pt_regs Copy the interrupt parameters from the lowcore to the pt_regs structure in entry[64].S and reduce the arguments of the low level interrupt handler to the pt_regs pointer only. In addition move the test-pending-interrupt loop from do_IRQ to entry[64].S to make sure that interrupt information is always delivered via pt_regs. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
61649881 |
|
23-Apr-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: system call path micro optimization Add a pointer to the system call table to the thread_info structure. The TIF_31BIT bit is set or cleared by SET_PERSONALITY exactly once for the lifetime of a process. With the pointer to the correct system call table in thread_info the system call code in entry64.S path can drop the check for TIF_31BIT which saves a couple of instructions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
dc7ee00d |
|
24-Apr-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: lowcore stack pointer offsets Store the stack pointers in the lowcore for the kernel stack, the async stack and the panic stack with the offset required for the first user. This avoids an unnecessary add instruction on the system call path. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
6551fbdf |
|
28-Feb-2013 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: critical section cleanup vs. machine checks The current machine check code uses the registers stored by the machine in the lowcore at __LC_GPREGS_SAVE_AREA as the registers of the interrupted context. The registers 0-7 of a user process can get clobbered if a machine checks interrupts the execution of a critical section in entry[64].S. The reason is that the critical section cleanup code may need to modify the PSW and the registers for the previous context to get to the end of a critical section. If registers 0-7 have to be replaced the relevant copy will be in the registers, which invalidates the copy in the lowcore. The machine check handler needs to explicitly store registers 0-7 to the stack. Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
39efd4ec |
|
21-Nov-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/ptrace: race of single stepping vs signal delivery The current single step code is racy in regard to concurrent delivery of signals. If a signal is delivered after a PER program check occurred but before the TIF_PER_TRAP bit has been checked in entry[64].S the code clears TIF_PER_TRAP and then calls do_signal. This is wrong, if the instruction completed (or has been suppressed) a SIGTRAP should be delivered to the debugger in any case. Only if the instruction has been nullified the SIGTRAP may not be send. The new logic always sets TIF_PER_TRAP if the program check indicates PER tracing but removes it again for all program checks that are nullifying. The effect is that for each change in the PSW address we now get a single SIGTRAP. Reported-by: Andreas Arnez <arnez@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
30dcb099 |
|
11-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
s390: switch to saner kernel_execve() semantics Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f322220d |
|
06-Sep-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
s390: convert to generic kernel_execve() same situation as with alpha and arm - only massage needed Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
37fe5d41 |
|
10-Sep-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
s390: fold kernel_thread_helper() into ret_from_fork() ... and don't bother with syscall return path in case of kernel threads. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
65f22a90 |
|
06-Sep-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
s390: fold execve_tail() into start_thread(), convert to generic sys_execve() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
27f6b416 |
|
20-Jul-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/vtimer: rework virtual timer interface The current virtual timer interface is inherently per-cpu and hard to use. The sole user of the interface is appldata which uses it to execute a function after a specific amount of cputime has been used over all cpus. Rework the virtual timer interface to hook into the cputime accounting. This makes the interface independent from the CPU timer interrupts, and makes the virtual timers global as opposed to per-cpu. Overall the code is greatly simplified. The downside is that the accuracy is not as good as the original implementation, but it is still good enough for appldata. Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a53c8fab |
|
20-Jul-2012 |
Heiko Carstens <hca@linux.ibm.com> |
s390/comments: unify copyright messages and remove file names Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
fbe76568 |
|
05-Jun-2012 |
Heiko Carstens <hca@linux.ibm.com> |
s390/smp: make absolute lowcore / cpu restart parameter accesses more robust Setting the cpu restart parameters is done in three different fashions: - directly setting the four parameters individually - copying the four parameters with memcpy (using 4 * sizeof(long)) - copying the four parameters using a private structure In addition code in entry*.S relies on a certain order of the restart members of struct _lowcore. Make all of this more robust to future changes by adding a mem_absolute_assign(dest, val) define, which assigns val to dest using absolute addressing mode. Also the load multiple instructions in entry*.S have been split into separate load instruction so the order of the struct _lowcore members doesn't matter anymore. In addition move the prototypes of memcpy_real/absolute from uaccess.h to processor.h. These memcpy* variants are not related to uaccess at all. string.h doesn't seem to match as well, so lets use processor.h. Also replace the eight byte array in struct _lowcore which represents a misaliged u64 with a u64. The compiler will always create code that handles the misaligned u64 correctly. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
eb546195 |
|
04-Jun-2012 |
Heiko Carstens <hca@linux.ibm.com> |
s390/sigp: use sigp order code defines in assembly code Use sigp order code defines in assembly code as well. With this change all places that use sigp constants should have been converted to use self describing defines instead of directly using constants. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
eda0c6d6 |
|
15-May-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390: fix race on TIF_MCCK_PENDING There is a small race window in the __switch_to code in regard to the transfer of the TIF_MCCK_PENDING bit from the previous to the next task. The bit is transferred before the task struct pointer and the thread-info pointer for the next task has been stored to lowcore. If a machine check sets the TIF_MCCK_PENDING bit between the transfer code and the store of current/thread_info the bit is still set for the previous task. And if the previous task has terminated it can get lost. The effect is that a pending CRW is not retrieved until the next machine checks sets TIF_MCCK_PENDING. To fix this reorder __switch_to to first store the task struct and thread-info pointer and then do the transfer of the bit. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
4c1051e3 |
|
11-Mar-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] rework idle code Whenever the cpu loads an enabled wait PSW it will appear as idle to the underlying host system. The code in default_idle calls vtime_stop_cpu which does the necessary voodoo to get the cpu time accounting right. The udelay code just loads an enabled wait PSW. To correct this rework the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts to entry[64].S, vtime_stop_cpu can now be called from anywhere and vtime_start_cpu is gone. The correction of the cpu time during wakeup from an enabled wait PSW is done with a critical section in entry[64].S. As vtime_start_cpu is gone, s390_idle_check can be removed as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
8b646bd7 |
|
11-Mar-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] rework smp code Define struct pcpu and merge some of the NR_CPUS arrays into it, including __cpu_logical_map, current_set and smp_cpu_state. Split smp related functions to those operating on physical cpus and the functions operating on a logical cpu number. Make the functions for physical cpus use a pointer to a struct pcpu. This hides the knowledge about cpu addresses in smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header. The PSW restart mechanism is used to start secondary cpus, calling a function on an online cpu, calling a function on the ipl cpu, and for the nmi signal. Replace the different assembler functions with a single function restart_int_handler. The new entry point calls a function whose pointer is stored in the lowcore of the target cpu and it can wait for the source cpu to stop. This covers all existing use cases. Overall the code is now simpler and there are ~380 lines less code. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
7e180bd8 |
|
11-Mar-2012 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] rename lowcore field The 16 bit value at the lowcore location with offset 0x84 is the cpu address that is associated with an external interrupt. Rename the field from cpu_addr to ext_cpu_addr to make that clear. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
aa33c8cb |
|
27-Dec-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cleanup trap handling Move the program interruption code and the translation exception identifier to the pt_regs structure as 'int_code' and 'int_parm_long' and make the first level interrupt handler in entry[64].S store the two values. That makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct and to reduce the number of arguments to a lot of functions. Finally un-inline do_trap. Overall this saves 5812 bytes in the .text section of the 64 bit kernel. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c5328901 |
|
27-Dec-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] entry[64].S improvements Another round of cleanup for entry[64].S, in particular the program check handler looks more reasonable now. The code size for the 31 bit kernel has been reduced by 616 byte and by 528 byte for the 64 bit version. Even better the code is a bit faster as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b6ef5bb3 |
|
30-Oct-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] add TIF_SYSCALL thread flag Add an explicit TIF_SYSCALL bit that indicates if a task is inside a system call. The svc_code in the pt_regs structure is now only valid if TIF_SYSCALL is set. With this definition TIF_RESTART_SVC can be replaced with TIF_SYSCALL. Overall do_signal is a bit more readable and it saves a few lines of code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
20b40a79 |
|
30-Oct-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] signal race with restarting system calls For a ERESTARTNOHAND/ERESTARTSYS/ERESTARTNOINTR restarting system call do_signal will prepare the restart of the system call with a rewind of the PSW before calling get_signal_to_deliver (where the debugger might take control). For A ERESTART_RESTARTBLOCK restarting system call do_signal will set -EINTR as return code. There are two issues with this approach: 1) strace never sees ERESTARTNOHAND, ERESTARTSYS, ERESTARTNOINTR or ERESTART_RESTARTBLOCK as the rewinding already took place or the return code has been changed to -EINTR 2) if get_signal_to_deliver does not return with a signal to deliver the restart via the repeat of the svc instruction is left in place. This opens a race if another signal is made pending before the system call instruction can be reexecuted. The original system call will be restarted even if the second signal would have ended the system call with -EINTR. These two issues can be solved by dropping the early rewind of the system call before get_signal_to_deliver has been called and by using the TIF_RESTART_SVC magic to do the restart if no signal has to be delivered. The only situation where the system call restart via the repeat of the svc instruction is appropriate is when a SA_RESTART signal is delivered to user space. Unfortunately this breaks inferior calls by the debugger again. The system call number and the length of the system call instruction is lost over the inferior call and user space will see ERESTARTNOHAND/ ERESTARTSYS/ERESTARTNOINTR/ERESTART_RESTARTBLOCK. To correct this a new ptrace interface is added to save/restore the system call number and system call instruction length. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
0edc8faa |
|
30-Oct-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] lowcore cleanup Remove the save_area_64 field from the 0xe00 - 0xf00 area in the lowcore. Use a free slot in the save_area array instead. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
7dd6b334 |
|
03-Aug-2011 |
Michael Holzheu <holzheu@linux.vnet.ibm.com> |
[S390] Add PSW restart shutdown trigger With this patch a new S390 shutdown trigger "restart" is added. If under z/VM "systerm restart" is entered or under the HMC the "PSW restart" button is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded. Now we execute do_restart() that processes the restart action that is defined under /sys/firmware/shutdown_actions/on_restart. Currently the following actions are possible: reipl (default), stop, vmcmd, dump, and dump_reipl. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
144d634a |
|
24-Jul-2011 |
Jan Glauber <jan.glauber@gmail.com> |
[S390] fix s390 assembler code alignments The alignment is missing for various global symbols in s390 assembly code. With a recent gcc and an instruction like stgrl this can lead to a specification exception if the instruction uses such a mis-aligned address. Specify the alignment explicitely and while add it define __ALIGN for s390 and use the ENTRY define to save some lines of code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f2db2e6c |
|
23-May-2011 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] pfault: cpu hotplug vs missing completion interrupts On cpu hot remove a PFAULT CANCEL command is sent to the hypervisor which in turn will cancel all outstanding pfault requests that have been issued on that cpu (the same happens with a SIGP cpu reset). The result is that we end up with uninterruptible processes where the interrupt that would wake up these processes never arrives. In order to solve this all processes which wait for a pfault completion interrupt get woken up after a cpu hot remove. The worst case that could happen is that they fault again and in turn need to wait again. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
8eb4bd66 |
|
10-May-2011 |
Michael Holzheu <holzheu@linux.vnet.ibm.com> |
[S390] kernel: Initialize register 14 when starting new CPU When starting a new CPU we currently jump to start_secondary() without setting register 14 (the return address) correctly. Therefore on the stack frame for start_secondary an invalid return address is stored. This leads to wrong stack back traces in kernel dumps. Example: #00 [1f33fe48] cpu_idle at 10614a #01 [1f33fe90] start_secondary at 54fa88 #02 [1f33feb8] (null) at 0 <--- invalid To fix this start_secondary() is called now with basr/brasl that sets register 14 correctly. The output of the stack backtrace looks then like the following: #00 [1f33fe48] cpu_idle at 10614a #01 [1f33fe90] start_secondary at 54fa88 #02 [1f33feb8] restart_base at 54f41e <--- correct Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
5e9a2692 |
|
04-Jan-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] ptrace cleanup Overhaul program event recording and the code dealing with the ptrace user space interface. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
1de3447a |
|
04-Jan-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] 31 bit entry.S update. Make the code in the 31 bit entry.S code as similar as possible to the 64 bit version in entry64.S. That makes it easier to add new code to the first level interrupt handler that affects both 31 and 64 bit kernels. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
860dba45 |
|
04-Jan-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] add kprobes annotations Add kprobes annotations to get the massive 'probe kernel.function("*") {}' stress test working. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9ec27080 |
|
29-Oct-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] fix kprobes single stepping Fix kprobes after git commit 1e54622e0403891b10f2105663e0f9dd595a1f17 broke it. The kprobe_handler is now called with interrupts in the state at the time of the breakpoint. The single step of the replaced instruction is done with interrupts off which makes it necessary to enable and disable the interupts in the kprobes code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
baa07158 |
|
25-Oct-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cleanup system call parameter setup Do the setup of the stack overflow argument for the sixth system call parameter right before the branch to the system call function. That simplifies the system call parameter access code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f6649a7e |
|
25-Oct-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cleanup lowcore access from external interrupts Read external interrupts parameters from the lowcore in the first level interrupt handler in entry[64].S. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
1e54622e |
|
25-Oct-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cleanup lowcore access from program checks Read all required fields for program checks from the lowcore in the first level interrupt handler in entry[64].S. If the context that caused the fault was enabled for interrupts we can now re-enable the irqs in entry[64].S. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f5cdac27 |
|
27-Jul-2010 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix IRQ tracing in case of PER In case user space is single stepped (PER) the program check handler claims too early that IRQs are enabled on the return path. Subsequent checks will notice that the IRQ mask in the PSW and what lockdep thinks the IRQ mask should be do not correlate and therefore will print a warning to the console and disable lockdep. Fix this by doing all the work within the correct context. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
86f2552b |
|
17-May-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] add breaking event address for user space Copy the last breaking event address from the lowcore to a new field in the thread_struct on each system entry. Add a new ptrace request PTRACE_GET_LAST_BREAK and a new utrace regset REGSET_LAST_BREAK to query the last breaking event. This is useful for debugging wild branches in user space code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
6377981f |
|
17-May-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] idle time accounting vs. machine checks A machine check can interrupt the i/o and external interrupt handler anytime. If the machine check occurs while the interrupt handler is waking up from idle vtime_start_cpu can get executed a second time and the int_clock / async_enter_timer values in the lowcore get clobbered. This can confuse the cpu time accounting. To fix this problem two changes are needed. First the machine check handler has to use its own copies of int_clock and async_enter_timer, named mcck_clock and mcck_enter_timer. Second the nested execution of vtime_start_cpu has to be prevented. This is done in s390_idle_check by checking the wait bit in the program status word. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
6a2df3a8 |
|
17-May-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] improve irq tracing code in entry[64].S The system call path in entry[64].S is run with interrupts enabled. Remove the irq tracing check from the system call exit code. If a program check interrupted a context enabled for interrupts do a call to trace_irq_off_caller in the program check handler before branching to the system call exit code. Restructure the system call and io interrupt return code to avoid avoid the lpsw[e] to disable machine checks. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
43d399d2 |
|
17-May-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cleanup sysc_work and io_work code Cleanup the #ifdef mess at io_work in entry[64].S and streamline the TIF work code of the system call and io exit path. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
176b1803 |
|
09-Apr-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] fix io_return critical section cleanup If a machine check interrupts the io interrupt handler on one of the instructions between io_return and io_leave the critical section cleanup code will move the return psw to io_work_loop. By doing that the switch from the asynchronous interrupt stack to the process stack is skipped. If e.g. TIF_NEED_RESCHED is set things break because the scheduler is called with the asynchronous interrupts stack. Moving the psw back to io_return instead fixes the problem. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
cbb870c8 |
|
26-Feb-2010 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Cleanup struct _lowcore usage and defines. Use asm offsets to make sure the offset defines to struct _lowcore and its layout don't get out of sync. Also add a BUILD_BUG_ON() which checks that the size of the structure is sane. And while being at it change those sites which use odd casts to access the current lowcore. These should use S390_lowcore instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
21ec7f6d |
|
27-Jan-2010 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] fix single stepped svcs with TRACE_IRQFLAGS=y If irq flags tracing is enabled the TRACE_IRQS_ON macros expands to a function call which clobbers registers %r0-%r5. The macro is used in the code path for single stepped system calls. The argument registers %r2-%r6 need to be restored from the stack before the system call function is called. Cc: stable@kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
bcc6525f |
|
13-Nov-2009 |
Christian Borntraeger <borntraeger@de.ibm.com> |
[S390] s390: fix single stepping on svc0 On s390 there are two ways of specifying the system call number for the svc instruction. The standard way is to use the immediate field in the instruction (or to use EXecute for values unknown during assemble time). This can encode 256 system calls. The kernel ABI also allows to put the system call number in r1 and then execute svc 0 to enable system call numbers > 255. It turns out that single stepping svc 0 is broken, since the PER program check handler uses r1. We have to use a different register. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a8c3cb49 |
|
11-Sep-2009 |
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> |
[S390] move (io|sysc)_restore_trace_psw into .data section The sysc_restore_trace_psw and io_restore_trace_psw storage locations are created in the .text section. When creating and IPLing from a named saved system (NSS), writing to these locations causes a protection exception (because the .text section is mapped as shared read-only in the NSS). To permit write access, move the storage locations into the .data section. The problem occurs only when CONFIG_TRACE_IRQFLAGS is set. The git commmit that has introduced these variables is: 411788ea7fca01ee803af8225ac35807b4d02050 Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
66700001 |
|
24-Aug-2009 |
Josh Stone <jistone@redhat.com> |
tracing: Rename FTRACE_SYSCALLS for tracepoints s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g The syscall enter/exit tracing is no longer specific to just ftrace, so they now have names that reflect their tie to tracepoints instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
9bf1226b |
|
12-Jun-2009 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] ftrace: add system call tracer support System call tracer support for s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
bcf5cef7 |
|
12-Jun-2009 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] secure computing arch backend Enable secure computing on s390 as well. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
5b409ed1 |
|
14-Apr-2009 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cpu hotplug and accounting values Reset the cpu timer to the maximum value and correctly initialize the cpu accounting values in the lowcore when the cpu is started. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
9cfb9b3c |
|
31-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] improve idle cputime accounting Distinguish the cputime of the idle process where idle is actually using cpu cycles from the cputime where idle is sleeping on an enabled wait psw. The former is accounted as system time, the later as idle time. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c185b783 |
|
25-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] Remove config options. On s390 we always want to run with precise cputime accounting. Remove the config options VIRT_TIMER and VIRT_CPU_ACCOUNTING. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
8f2961c3 |
|
25-Dec-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
[S390] audit: get s390 ret_from_fork in sync with other architectures On s390 we have ret_from_fork jump not to the "do all work we normally do on return from syscall" as on x86, ppc, etc., but to the "do all such work except audit". Historical reasons - the codepath triggered when we have AUDIT process flag set is separated from the normall one and they converge at sysc_return, which is the common part of post-syscall work. And does not include calling audit_syscall_exit() - that's done in the end of sysc_tracesys path, just before that path jumps to sysc_return. IOW, the child returning from fork()/clone()/vfork() doesn't call audit_syscall_exit() at all, so no matter what we do with its audit context, we are not going to see the audit entry. The fix is simple: have ret_from_fork go to the point just past the call of sys_.... in the 'we have AUDIT flag set' path. There we have (64bit variant; for 31bit the situation is the same): sysc_tracenogo: tm __TI_flags+7(%r9),(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT) jz sysc_return la %r2,SP_PTREGS(%r15) # load pt_regs larl %r14,sysc_return # return point is sysc_return jg do_syscall_trace_exit which is precisely what we need - check the flag, bugger off to sysc_return if not set, otherwise call do_syscall_trace_exit() and bugger off to sysc_return. r9 has just been properly set by ret_from_fork itself, so we are fine. Tested on s390x, seems to work fine. WARNING: it's been about 16 years since my last contact with 3X0 assembler[1], so additional review would be very welcome. I don't think I've managed to screw it up, but... [1] that *was* in another country and besides, the box is dead... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
59da2139 |
|
27-Nov-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] fix system call parameter functions. syscall_get_nr() currently returns a valid result only if the call chain of the traced process includes do_syscall_trace_enter(). But collect_syscall() can be called for any sleeping task, the result of syscall_get_nr() in general is completely bogus. To make syscall_get_nr() work for any sleeping task the traps field in pt_regs is replace with svcnr - the system call number the process is executing. If svcnr == 0 the process is not on a system call path. The syscall_get_arguments and syscall_set_arguments use regs->gprs[2] for the first system call parameter. This is incorrect since gprs[2] may have been overwritten with the system call number if the call chain includes do_syscall_trace_enter. Use regs->orig_gprs2 instead. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
50bec4ce |
|
14-Nov-2008 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] ftrace: fix kernel stack backchain walking With CONFIG_IRQSOFF_TRACER the trace_hardirqs_off() function includes a call to __builtin_return_address(1). But we calltrace_hardirqs_off() from early entry code. There we have just a single stack frame. So this results in a kernel stack backchain walk that would walk beyond the kernel stack. Following the NULL terminated backchain this results in a lowcore read access. To fix this we simply call trace_hardirqs_off_caller() and pass the current instruction pointer. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
af4c6874 |
|
14-Nov-2008 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] lockdep: fix compile bug arch/s390/kernel/built-in.o: In function `cleanup_io_leave_insn': mem_detect.c:(.text+0x10592): undefined reference to `lockdep_sys_exit' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
753c4dd6 |
|
10-Oct-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] ptrace changes * System call parameter and result access functions * Add tracehook calls * Split syscall_trace into two functions do_syscall_trace_enter and do_syscall_trace_exit Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
2688905e |
|
07-May-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] s390: Optimize user and work TIF check On return from syscall or interrupt, we have to check if we return to userspace (likely) and if there is work todo (less likely) to decide if we handle the work. We can optimize this check: we first check for the less likely work case and then check for userspace. This patch is also a preparation for an additional patch, that fixes a bug in KVM dealing with cpu bound guests. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
02a029b3 |
|
30-Apr-2008 |
Roland McGrath <roland@redhat.com> |
signals: s390: renumber TIF_RESTORE_SIGMASK TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to free one up. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2bc89b5e |
|
05-Feb-2008 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix couple of section mismatches. Fix couple of section mismatches. And since we touch the code anyway change the IPL code to use C99 initializers. Cc: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
ab1809b4 |
|
04-Dec-2007 |
Christian Borntraeger <borntraeger@de.ibm.com> |
[S390] Fix compile error on 31bit without preemption Commit b8e7a54cd06b0b0174029ef3a7f5a1415a2c28f2 introduced a compile error if CONFIG_PREEMPT is not set: arch/s390/kernel/built-in.o: In function `cleanup_io_leave_insn': /space/kvm/arch/s390/kernel/entry.S:(.text+0xbfce): undefined reference to `preempt_schedule_irq' This patch hides preempt_schedule_irq if CONFIG_PREEMPT is not set. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
b8e7a54c |
|
20-Nov-2007 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix kernel preemption. When returning from IRQ handling and TIF_NEED_RESCHED is set we must call preempt_schedule_irq() instead of schedule(). Otherwise the BKL might be unlocked in schedule() and therfore everything that relies on the BKL is broken. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
411788ea |
|
20-Nov-2007 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix irq tracing and lockdep_sys_exit calls. Current support for TRACE_IRQFLAGS and lockdep_sys_exit is broken. IRQ flag tracing is broken for program checks. Even worse is that the newly introduced calls to lockdep_sys_exit are in the critical section code which is not supposed to call any C functions. In addition the checks if locks are still held are also done when returning to kernel code which is broken as well. Fix all this by disabling interrupts and machine checks at the exit paths and then do the appropriate checks and calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
523b44cf |
|
11-Oct-2007 |
Heiko Carstens <hca@linux.ibm.com> |
lockdep: s390: connect the sysexit hook Run the lockdep_sys_exit hook before returning to user space. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b771aeac |
|
26-Jul-2007 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix IRQ tracing. If a machine check is pending and the external or I/O interrupt handler returns to userspace io_mcck_pending is going to call s390_handle_mcck. Before this happens a call to TRACE_IRQS_ON was already made since we know that we are going back to userspace and hence interrupts will be enabled. So there was an indication that interrupts are enabled while in reality they are still disabled. s390_handle_mcck will do a local_irq_save/restore pair and confuse lockdep which later complains about inconsistent irq tracing. To solve this just call trace_hardirqs_off before calling s390_handle_mcck and trace_hardirqs_on afterwards. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
987ad70a |
|
10-Jul-2007 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] system call optimization. After the in-kernel system call has been remove the system call path can be optimized. The problem state bit of the old psw is always set between system_call and sysc_do_svc. SAVE_ALL_SVC uses this information to avoid two instructions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
84b36a8e |
|
19-Jun-2007 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Fix yet another two section mismatches. WARNING: arch/s390/kernel/built-in.o(.text+0xb92a): Section mismatch: reference to .init.text:start_secondary (between 'restart_addr' and 'stack_overflow') WARNING: arch/s390/appldata/built-in.o(.data+0xdc): Section mismatch: reference to .init.text: (between 'appldata_nb' and 'appldata_timer_lock') Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
03ff9a23 |
|
27-Apr-2007 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] System call cleanup. Remove system call glue for sys_clone, sys_fork, sys_vfork, sys_execve, sys_sigreturn, sys_rt_sigreturn and sys_sigaltstack. Call do_execve from kernel_execve directly, move pt_regs to the right place and branch to sysc_return to start the user space program. This removes the last in-kernel system call. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
25d83cbf |
|
28-Sep-2006 |
Heiko Carstens <hca@linux.ibm.com> |
[S390] Whitespace cleanup. Huge s390 assembly files whitespace cleanup. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
4ba069b8 |
|
20-Sep-2006 |
Michael Grundy <grundym@us.ibm.com> |
[S390] add kprobes support. Signed-off-by: Michael Grundy <grundym@us.ibm.com> Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
1f194a4c |
|
03-Jul-2006 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] lockdep: irqtrace subsystem, s390 support irqtrace support for s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
d882b172 |
|
01-Jul-2006 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] s390: put sys_call_table into .rodata section and write protect it Put s390's syscall tables into .rodata section and write protect this section to prevent misuse of it. Suggested by Arjan van de Ven <arjan@infradead.org>. Cc: Arjan van de Ven <arjan@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
8f27766a |
|
29-Jun-2006 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] remove export of sys_call_table Remove export of the sys_call_table symbol to prevent the misuse of it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
63b12246 |
|
29-Jun-2006 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] virtual cpu accounting vs. machine checks. If a machine checks interrupts the external or the i/o interrupt handler before they have completed the cpu time calculations, the accounting goes wrong. After the cpu returned from the machine check handler to the interrupted interrupt handler, a negative cpu time delta can occur. If the accumulated cpu time in lowcore is small enough this value can get negative as well. The next jiffy interrupt will pick up that negative value, shift it by 12 and add the now huge positive value to the cpu time of the process. To solve this the machine check handler is modified not to change any of the timestamps in the lowcore if the machine check interrupted kernel context. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
54dfe5dd |
|
01-Feb-2006 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] s390: Add support for new syscalls/TIF_RESTORE_SIGMASK Add support for the new *at, pselect6 and ppoll system calls. This includes adding required support for TIF_RESTORE_SIGMASK. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e1c3ad96 |
|
07-Nov-2005 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] s390: signal delivery Always create all signal frames for pending signals before returning to userspace, not just a single one. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6add9f7f |
|
16-Sep-2005 |
Peter Oberparleiter <peter.oberparleiter@de.ibm.com> |
[PATCH] s390: kernel stack corruption When an asynchronous interruption occurs during the execution of the 'critical section' within the generic interruption handling code (entry.S), a faulty check for a userspace PSW may result in a corrupted kernel stack pointer which subsequently triggers a stack overflow check. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0013a854 |
|
09-Sep-2005 |
Sam Ravnborg <sam@mars.(none)> |
kbuild: m68k,parisc,ppc,ppc64,s390,xtensa use generic asm-offsets.h support Delete obsoleted parts form arch makefiles and rename to asm-offsets.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
ae6aa2ea |
|
03-Sep-2005 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] s390: machine check handler bugs The new machine check handler still has a few bugs. 1) The system entry time has to be stored in the machine check handler, 2) the machine check return psw may not be stored at the usual place because it might overwrite the return psw of the interrupted context, 3) the return address for the call to s390_handle_mcck in the i/o interrupt handler is not correct, 4) the system call cleanup has to take the different save area of the machine check handler into account, 5) the machine check handler may not call UPDATE_VTIME before CREATE_STACK_FRAME, and 6) the io leave path needs a critical section cleanup to make sure that the TIF_MCCK_PENDING bit is really checked before switching back to user space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8ffa7405 |
|
27-Jul-2005 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] s390: cpu timer reset in machine check handler Fix wrong move direction of timer values for cpu accounting in case of a machine check that indicates a broken cpu timer. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
77fa2245 |
|
25-Jun-2005 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] s390: improved machine check handling Improved machine check handling. Kernel is now able to receive machine checks while in kernel mode (system call, interrupt and program check handling). Also register validation is now performed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|