1.. SPDX-License-Identifier: GPL-2.0
2
3======================================================
4Control-flow Enforcement Technology (CET) Shadow Stack
5======================================================
6
7CET Background
8==============
9
10Control-flow Enforcement Technology (CET) covers several related x86 processor
11features that provide protection against control flow hijacking attacks. CET
12can protect both applications and the kernel.
13
14CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack
15is a secondary stack allocated from memory which cannot be directly modified by
16applications. When executing a CALL instruction, the processor pushes the
17return address to both the normal stack and the shadow stack. Upon
18function return, the processor pops the shadow stack copy and compares it
19to the normal stack copy. If the two differ, the processor raises a
20control-protection fault. IBT verifies indirect CALL/JMP targets are intended
21as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow
22Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace
23shadow stack and kernel IBT are supported.
24
25Requirements to use Shadow Stack
26================================
27
28To use userspace shadow stack you need HW that supports it, a kernel
29configured with it and userspace libraries compiled with it.
30
31The kernel Kconfig option is X86_USER_SHADOW_STACK.  When compiled in, shadow
32stacks can be disabled at runtime with the kernel parameter: nousershstk.
33
34To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later
35are required.
36
37At run time, /proc/cpuinfo shows CET features if the processor supports
38CET. "user_shstk" means that userspace shadow stack is supported on the current
39kernel and HW.
40
41Application Enabling
42====================
43
44An application's CET capability is marked in its ELF note and can be verified
45from readelf/llvm-readelf output::
46
47    readelf -n <application> | grep -a SHSTK
48        properties: x86 feature: SHSTK
49
50The kernel does not process these applications markers directly. Applications
51or loaders must enable CET features using the interface described in section 4.
52Typically this would be done in dynamic loader or static runtime objects, as is
53the case in GLIBC.
54
55Enabling arch_prctl()'s
56=======================
57
58Elf features should be enabled by the loader using the below arch_prctl's. They
59are only supported in 64 bit user applications. These operate on the features
60on a per-thread basis. The enablement status is inherited on clone, so if the
61feature is enabled on the first thread, it will propagate to all the thread's
62in an app.
63
64arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
65    Enable a single feature specified in 'feature'. Can only operate on
66    one feature at a time.
67
68arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
69    Disable a single feature specified in 'feature'. Can only operate on
70    one feature at a time.
71
72arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
73    Lock in features at their current enabled or disabled status. 'features'
74    is a mask of all features to lock. All bits set are processed, unset bits
75    are ignored. The mask is ORed with the existing value. So any feature bits
76    set here cannot be enabled or disabled afterwards.
77
78arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features)
79    Unlock features. 'features' is a mask of all features to unlock. All
80    bits set are processed, unset bits are ignored. Only works via ptrace.
81
82arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr)
83    Copy the currently enabled features to the address passed in addr. The
84    features are described using the bits passed into the others in
85    'features'.
86
87The return values are as follows. On success, return 0. On error, errno can
88be::
89
90        -EPERM if any of the passed feature are locked.
91        -ENOTSUPP if the feature is not supported by the hardware or
92         kernel.
93        -EINVAL arguments (non existing feature, etc)
94        -EFAULT if could not copy information back to userspace
95
96The feature's bits supported are::
97
98    ARCH_SHSTK_SHSTK - Shadow stack
99    ARCH_SHSTK_WRSS  - WRSS
100
101Currently shadow stack and WRSS are supported via this interface. WRSS
102can only be enabled with shadow stack, and is automatically disabled
103if shadow stack is disabled.
104
105Proc Status
106===========
107To check if an application is actually running with shadow stack, the
108user can read the /proc/$PID/status. It will report "wrss" or "shstk"
109depending on what is enabled. The lines look like this::
110
111    x86_Thread_features: shstk wrss
112    x86_Thread_features_locked: shstk wrss
113
114Implementation of the Shadow Stack
115==================================
116
117Shadow Stack Size
118-----------------
119
120A task's shadow stack is allocated from memory to a fixed size of
121MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
122the maximum size of the normal stack, but capped to 4 GB. In the case
123of the clone3 syscall, there is a stack size passed in and shadow stack
124uses this instead of the rlimit.
125
126Signal
127------
128
129The main program and its signal handlers use the same shadow stack. Because
130the shadow stack stores only return addresses, a large shadow stack covers
131the condition that both the program stack and the signal alternate stack run
132out.
133
134When a signal happens, the old pre-signal state is pushed on the stack. When
135shadow stack is enabled, the shadow stack specific state is pushed onto the
136shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
137in a special format with bit 63 set. On sigreturn this old SSP token is
138verified and restored by the kernel. The kernel will also push the normal
139restorer address to the shadow stack to help userspace avoid a shadow stack
140violation on the sigreturn path that goes through the restorer.
141
142So the shadow stack signal frame format is as follows::
143
144    |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format
145                    (bit 63 set to 1)
146    |        ...| - Other state may be added in the future
147
148
14932 bit ABI signals are not supported in shadow stack processes. Linux prevents
15032 bit execution while shadow stack is enabled by the allocating shadow stacks
151outside of the 32 bit address space. When execution enters 32 bit mode, either
152via far call or returning to userspace, a #GP is generated by the hardware
153which, will be delivered to the process as a segfault. When transitioning to
154userspace the register's state will be as if the userspace ip being returned to
155caused the segfault.
156
157Fork
158----
159
160The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
161to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
162shadow access triggers a page fault with the shadow stack access bit set
163in the page fault error code.
164
165When a task forks a child, its shadow stack PTEs are copied and both the
166parent's and the child's shadow stack PTEs are cleared of the dirty bit.
167Upon the next shadow stack access, the resulting shadow stack page fault
168is handled by page copy/re-use.
169
170When a pthread child is created, the kernel allocates a new shadow stack
171for the new thread. New shadow stack creation behaves like mmap() with respect
172to ASLR behavior. Similarly, on thread exit the thread's shadow stack is
173disabled.
174
175Exec
176----
177
178On exec, shadow stack features are disabled by the kernel. At which point,
179userspace can choose to re-enable, or lock them.
180