1%
2% Copyright 2014, General Dynamics C4 Systems
3%
4% This software may be distributed and modified according to the terms of
5% the GNU General Public License version 2. Note that NO WARRANTY is provided.
6% See "LICENSE_GPLv2.txt" for details.
7%
8% @TAG(GD_GPL)
9%
10
11\chapter{\label{ch:threads}Threads and Scheduling contexts}
12
13\section{Threads}
14\label{sec:threads}
15
16seL4 provides threads to represent an execution context, while scheduling contexts are used to manage
17processor time.
18A thread is represented in seL4 by its thread control block
19object (\obj{TCB}) and a scheduling context by a scheduling context object (\obj{SCO}).
20Threads cannot run unless they are bound to, or receive a scheduling context.
21
22\subsection{Thread control blocks}
23
24Each \obj{TCB} has an associated CSpace (see
25\autoref{ch:cspace}) and VSpace (see \autoref{ch:vspace}) which
26may be shared with other threads. A \obj{TCB} may also have an IPC buffer
27(see  \autoref{ch:ipc}), which is used to pass extra arguments during IPC
28or kernel object invocation that do not fit in the architecture-defined message
29registers. While it is not compulsory that a thread has an IPC buffer,
30it will not be able to perform most kernel invocations, as they require
31cap transfer.
32Each thread belongs to exactly one security domain (see
33\autoref{sec:domains}).
34%FIXME: there is much more information held in the TCB!
35
36\subsection{Thread Creation}
37\label{sec:thread_creation}
38
39Like other objects, \obj{TCB}s are created with the
40\apifunc{seL4\_Untyped\_Retype}{untyped_retype} method (see
41\autoref{sec:kernmemalloc}). A newly created thread is initially inactive. It
42is configured by setting its CSpace and VSpace with the
43\apifunc{seL4\_TCB\_SetSpace}{tcb_setspace}
44or \apifunc{seL4\_TCB\_Configure}{tcb_configure} methods and then calling
45\apifunc{seL4\_TCB\_WriteRegisters}{tcb_writeregisters} with an initial stack pointer and instruction
46pointer. The thread can then be activated either by setting the
47\texttt{resume\_target} parameter in the \apifunc{seL4\_TCB\_WriteRegisters}{tcb_writeregisters} invocation to true
48or by seperately calling the \apifunc{seL4\_TCB\_Resume}{tcb_resume} method. Both of these methods 
49place the thread in a runnable state. If the thread has a scheduling context it will begin running, and on
50multicore machines, the thread would be running on the core that the scheduling context is for.
51
52\subsection{Thread Deactivation}
53\label{sec:thread_deactivation}
54
55The \apifunc{seL4\_TCB\_Suspend}{tcb_suspend} method deactivates a thread.
56Suspended threads can later be resumed.
57Their suspended state can be retrieved with the
58\apifunc{seL4\_TCB\_ReadRegisters}{tcb_readregisters} and
59\apifunc{seL4\_TCB\_CopyRegisters}{tcb_copyregisters} methods.
60They can also be reconfigured and
61reused or left suspended indefinitely if not needed. Threads will be
62automatically suspended when the last capability to their \obj{TCB} is
63deleted.
64% an example of which is demonstrated in \nameref{ex:second_thread}.
65
66\subsection{Scheduling Contexts}
67\label{sec:scheduling_contexts}
68
69Access to CPU execution time is controlled through scheduling context objects.
70Scheduling contexts are configured with a tuple of
71\textit{budget}$(b)$ and \textit{period} $(p)$, both in microseconds, set by 
72\apifunc{seL4\_SchedControl\_Configure}{schedcontrol_configure} (see \autoref{sec:sc_creation}).
73The tuple $(b, p)$ forms an upper bound on the thread's execution -- the kernel will not permit a 
74thread to run for more than $b$ out of every $p$ microseconds. However, $\frac{b}{p}$ does not 
75represent a lower bound on execution, as a thread must have the highest or equal highest priority 
76of all runnable threads to be guaranteed to be scheduled at all, and the kernel does not conduct 
77an admission test. As a result the set of all parameters is not neccesserily schedulable. If 
78multiple threads have available budget concurrently they are scheduled first-in first-out, and 
79round-robin scheduling is applied once the budget is expired.
80
81A scheduling context that is eligible to be picked by the scheduler, i.e has budget available, is
82referred to as \emph{active}.  Budget charging and replenishment rules are different for round-robin
83and sporadic threads.  For round-robin threads, the budget is charged each time the current node's
84scheduling context is changed, until it is depleted and then refilled immediately.
85
86Threads where $b == p$ are treated as round robin threads, where $b$ acts as a timeslice. 
87Otherwise the kernel uses \emph{sporadic servers} to enforce temporal isolation, which enforce the 
88property that $\frac{b}{p}$ cannot be exceeded for all possible $p$.
89In theory, sporadic servers provide temporal isolation -- preventing threads from exceeding their allocated budget -- by using the following algorithm:
90
91\begin{itemize} 
92\item When a thread starts executing at current time $T$, record $T_{s}$ 
93\item When a thread stops executing (blocks or is preempted), schedule a replenishment at $T_{s} + p$ for the
94amount of time consumed ($T - T_{s}$) and subtract that from the current replenishment being used.
95\end{itemize}
96
97seL4 implements this algorithm by maintaining an ordered list of sporadic replenishments -- 
98\texttt{refills} for brevity -- in each scheduling context. Each replenishment contains a tuple 
99of the time it is eligible for use (\emph{rTime}) and the amount that replenishment is for
100(\texttt{rAmount}). 
101While a thread is executing, it constantly drains the budget from the \texttt{rAmount} at the head of the 
102replenishment list. If the \texttt{rTime} is in the future, the thread bound to that
103scheduling context is placed in a queue of threads waiting for more budget. 
104
105Round-robin threads are treated that same as sporadic threads except for one case: how the
106budget is charged. Round-robin threads have two refills only, both of which are always ready to be 
107used. When a round-robin thread stops executing, budget is moved from the head to the tail 
108replenishment. Once the head budget is consumed, the thread is removed from the scheduling queue 
109for its priority and appended at the tail.
110
111Sporadic threads behave differently depending on the amount of replenishments available, which 
112must be bounded. Developers have two options to configure the size of the replenishment list:
113
114\begin{itemize}
115\item The maximum number of refills in a single scheduling context is determined by the size
116      of the scheduling context when created by \apifunc{seL4\_Untyped\_Retype}{untyped_retype}.
117\item A per scheduling context parameter, \texttt{extra\_refills} that
118limits the number of refills for that specific scheduling context. This value is added to the base
119value of 2 and is limited by the size of the scheduling context.
120\end{itemize}
121 
122Threads that have short exection times (e.g interrupt handlers) and are not frequently preempted
123should have less refills, while longer running threads with long values of $b$ should have a higher
124value. Threads bound to a scheduling context with 0 extra refills will behave periodically -- tasks
125that use their head replenishment, or call yield, will not be scheduled again until the start of
126their next period.
127
128Given the number of replenishments is limited, if a node's SC changes and the outgoing SC does not
129have enough space to store the new replenishment, space is created by removing the current
130replenishment which can result in preemption if the next replenishment is not yet available.
131Scheduling contexts with a higher number of configured refills will consume closer
132to their whole budget, as they can be preempted or switch threads more often without filling their
133replenishment queue. However, the scheduling overhead will be higher as the replenishment list is
134subject to fragmentation.
135
136Whenever a thread is executing it consumes the budget from its current scheduling context.  The
137system call \apifunc{seL4\_Yield}{sel4_yield} can be used to sacrifice any remaining budget and
138block until the next replenishment is ready to be used.
139
140Threads can be bound to scheduling contexts using \apifunc{seL4\_TCB\_Configure}{tcb_configure} or
141\apifunc{seL4\_SchedContext\_Bind}{schedcontext_bind}, both invocations have the same effect
142although \apifunc{seL4\_TCB\_Configure}{tcb_configure} allows more thread fields to be set with only
143one kernel entry.  When a thread is bound to a scheduling context, if it is in a runnable state and
144the scheduling context is active, it will be added to the scheduler.
145
146\subsection{Passive Threads} \label{sec:passive}
147
148Threads can be unbound from a scheduling context with
149\apifunc{seL4\_SchedContext\_UnbindObject}{schedcontext_unbindobject}.  This is distinct from
150suspending a thread, in that threads that are blocked waiting in an endpoint or notification queue
151will remain in the queue and can still recieve messages and signals.  However, the unbound thread
152will not be schedulable again until it receives a scheduling context.  Threads without scheduling
153contexts are referred to as \emph{passive} threads, as they cannot execute without the action of
154another thread.
155
156\subsection{Scheduling Context Creation} \label{sec:sc_creation}
157
158Like other objects, scheduling contexts are created from untyped memory using
159\apifunc{seL4\_UntypedRetype}{untyped_retype}.  On creation, scheduling contexts are empty,
160representing 0\% of CPU execution time.  To populate a scheduling context with parameters, one must
161invoke the appropriate \obj{SchedControl} capability, which provides access to CPU time management
162on a single node.  A scheduling control cap for each node is provided to the initial task at run
163time.  Threads run on the node that their scheduling context is configured for.  Scheduling context
164parameters can then be set and updated using
165\apifunc{seL4\_SchedControl\_Configure}{schedcontrol_configure}, which allows the budget and period
166to be specified.
167
168The kernel does not conduct any schedulability tests, as task admission is left to user-level policy
169and can be conducted online or offline, statically or dynamically or not at all.
170
171\subsection{Scheduling Context Donation}
172
173In addition to explicitly binding and removing scheduling contexts through
174\apifunc{seL4\_SchedContext\_Bind}{schedcontext_bind} and
175\apifunc{seL4\_SchedContext\_UnbindObject}{schedcontext_unbindobject}, scheduling contexts can move
176between threads over IPC.  Scheduling contexts are donated implicitly when the system calls
177\apifunc{seL4\_Call}{sel4_call} and \apifunc{seL4\_NBSendRecv}{sel4_nbsendrecv} are used to
178communicate with a passive thread.  When an active thread invokes an endpoint with
179\apifunc{seL4\_Call}{sel4_call} and rendezvous with a passive thread, the active thread's scheduling
180context is donated to the passive thread. The generated reply cap ensures that the callee is merely
181borrowing the scheduling context: when the reply cap is consumed by a reply message being sent the
182scheduling context will be returned to the caller.  If the reply cap is revoked, and the callee
183holds the scheduling context, the scheduling context will be returned to the caller.  However, if in
184a deep call chain and a reply cap in the middle of the call chain is revoked, such that the callee
185does not possess the scheduling context, the thread will be removed from the call chain and the
186scheduling context will remain where it is.  If the receiver does not provide a reply object to
187track the donation in (i.e uses \apifunc{seL4\_Wait}{sel4_wait} instead of
188\apifunc{seL4\_Recv}{sel4_recv} scheduling context donation will not occur but the message will be
189delivered. The passive receiver will be set to inactive as it does not have a scheduling context.  
190
191Consider an example where thread A calls thread B which calls thread C.  If whilst C holds the
192scheduling context, B's reply cap to A is revoked, then the scheduling context will remain with C.
193However, a call chain will remain between A and C, such that if C's reply cap is revoked, or
194invoked, the scheduling context will return to A.
195
196\apifunc{seL4\_NBSendRecv}{sel4_nbsendrecv} can also result in scheduling context donation. 
197If the non-blocking send phase of the operation results in message delivery to a passive thread, the
198scheduling context will be donated to that passive thread and the thread making the system call becomes passive on the 
199receiving endpoint in the receive phase.  No reply capability generated, so there
200is no guarantee that the scheduling context will return, which increases book keeping complexity but allows 
201for data-flow like architectures rather than remote-procedure calls. Note that \apifunc{seL4\_Call}{sel4_call}
202does not guarantee the return of a scheduling context: this is an inherently trusted operation as the 
203server could never reply and return the scheduling context. 
204
205Scheduling contexts can also be bound to notification objects using
206\apifunc{seL4\_SchedContext\_Bind}{schedcontext_bind} and unbound using
207\apifunc{seL4\_SchedContext\_UnbindObject}{schedcontext_unbindobject}.  If a signal is delivered to
208a notification object with a passive thread blocked waiting on it, the passive thread will receive
209the scheduling context that is bound to the notification object.  The scheduling context is returned
210when the thread blocks on the notification object.  This feature allows for passive servers to use
211notification binding (See \autoref{sec:notification-binding}).
212
213Scheduling contexts can be unbound from all objects (notification objects and TCBs that are bound or
214have received a scheduling context through donation) using
215\apifunc{seL4\_SchedContext\_Unbind}{schedcontext_unbind}.
216
217Passive threads will run on the CPU node that the scheduling context was configured with, and will
218be migrated on IPC.
219
220\subsection{Scheduling algorithm} \label{sec:sched}
221
222seL4 uses a preemptive, tickless, scheduler with 256 priority levels (0 --- 255).  All threads have
223a maximum controlled priority (MCP) and a priority, the latter being the effective priority of the
224thread. 
225When a thread modifies a another threads priority (including itself) it must provide a
226thread capability from which to use the MCP from. Threads can only set priorities and MCPs
227to be less than or equal to the provided thread's MCP.
228The initial task starts with an MCP and priority as the highest priority in the system (\texttt{seL4\_MaxPrio}).
229Thread priority and MCP can be
230set with \apifunc{seL4\_TCB\_SetSchedParams}{tcb_setschedparams} and
231\apifunc{seL4\_TCB\_SetPriority}{tcb_setpriority},
232\apifunc{seL4\_TCB\_SetMCPriority}{tcb_setmcpriority} methods.
233
234Additionally, threads are only eligible for scheduling if they have an active scheduling context.
235Of threads eligible for scheduling, the highest priority thread in a runnable state is chosen.
236
237Thread priority (structure \texttt{seL4\_PrioProps\_t}) consists of two values as follows:
238
239\begin{description} \item[Priority] the priority a thread will be scheduled with.  \item[Maximum
240controlled priority (MCP)] the highest priority a thread can set itself or another thread to.
241\end{description}
242
243Threads of sufficient maximum controlled priority and with possession of the appropriate scheduling
244context capability can manipulate the scheduler and implement user-level schedulers using IPC.
245
246Scheduling contexts provide access to and an upper bound on exection CPU time, however when a thread
247executes is determined by thread priority.
248Consequently, access to CPU is a function of thread MCPs, scheduling contexts and the
249\obj{SchedControl} capability.  The kernel will enforce that threads do not exceeed the budget in
250their scheduling context for any given period, and that the highest priority thread will always run,
251however it is up to the system designer to make sure the entire system is schedulable.
252
253\subsection{Exceptions}
254\label{sec:exceptions}
255
256Each thread has two associated exception-handler endpoints, a \emph{standard}
257exception handler and a \emph{timeout} exception handler. If the thread causes
258an exception, the kernel creates an IPC message with the relevant details and
259sends this to the endpoint. This thread can then take the appropriate action.
260Fault IPC messages are described in \autoref{sec:faults}.  Standard exception-handler
261endpoints can be set with the \apifunc{seL4\_TCB\_SetSpace}{tcb_setspace} or
262\apifunc{seL4\_TCB\_SetSchedParams}{tcb_setschedparams} methods while Timeout exception
263handlers an be set with \apifunc{seL4\_TCB\_SetTimeoutHandler}{tcb_settimeouthandler}. 
264With these methods, a
265capability address for the exception handler can be associated with a thread.
266This address is then used to lookup the handler endpoint, and the capability to
267the endpoint is installed into the threads' kernel CNode.  For threads without
268an exception handler, a null capability can be used, however the consequences
269are different per exeception handler type.  Before raising an exception the
270handler capability is validated. The kernel does not perform another lookup,
271but checks that the capability is an endpoint with the correct rights.
272
273The exception endpoint must have send and grant rights. Replying to the
274exception message restarts the thread. For certain exception types, the
275contents of the reply message may be used to set the values in the registers of
276the thread being restarted.  See \autoref{sec:faults} for details.
277
278\subsubsection{Standard Exceptions}
279
280The standard exception handler is used when a fault is triggered by a thread
281which cannot be recovered without action by another thread.  For example, if a
282thread raises a fault due to an unmapped virtual memory page, the thread cannot
283make any more progress until the page is mapped.  If a thread experiences a
284fault that would trigger the standard exception handler while it is set to a
285null capability, the kernel will pause the thread and it will not run again.
286This is because without action by another thread, standard exceptions cannot be
287recovered from.  Consequently threads without standard exception handlers
288should be trusted not to fault at all.
289
290Standard exception handlers can be passive, in which case they will run on the
291scheduling context of the faulting thread.
292
293\subsubsection{Timeout Exceptions} \label{sec:timeout-exceptions}
294
295Timeout faults are raised when a thread attempts to run but has no available
296budget, and if that thread has a valid timeout exception handler capability.
297The handling of timeout faults is not compulsory: if a thread does not have a
298timeout fault handler, a fault will not be raised and the thread will continue
299running when it's budget is replenished.  This allows temporally sensitive
300threads to handle budget overruns while other threads may ignore them.
301
302Timeout faults are registered per thread, which means that while clients may
303not have a timeout fault handler, servers may, allowing single-threaded,
304time-sensitive, passive servers to use a timeout exception handler to recover
305from malicious or untrusted clients whose budget expires while the server is
306completing the request.  Timeout fault handlers can access server reply objcets
307reply with an error to the client, then reset the server to handle the next
308client request.
309
310If a reply message is sent to a nested server and a scheduling context without
311available budget returned, another timeout fault will be generated if the
312nested server also has a timeout fault handler.
313
314Additionally, if the system criticality is changed while a thread with higher
315criticality than the system criticality is running on a scheduling context that
316is bound to a thread with criticality lower than the system criticality, a
317timeout exception will be raised.
318
319\subsection{Message Layout of the Read-/Write-Registers Methods}
320\label{sec:read_write_registers}
321
322The registers of a thread can be read and written with the
323\apifunc{seL4\_TCB\_ReadRegisters}{tcb_readregisters} and \apifunc{seL4\_TCB\_WriteRegisters}{tcb_writeregisters} methods.
324For some registers, the kernel will silently mask certain bits or ranges of bits off, and force them to contain certain
325values to ensure that they cannot be maliciously set to values that would compromise the running system, or to respect
326values that the architecture specifications have mandated to be certain values.
327The register contents are transferred via the IPC buffer.
328
329\section{Faults}
330\label{sec:faults}
331
332A thread's actions may result in a fault. Faults are delivered to the
333thread's exception handler so that it can take the appropriate action.
334The fault type is specified in the message label and is one of:
335seL4\_Fault\_CapFault, seL4\_Fault\_VMFault, seL4\_Fault\_UnknownSyscall,
336seL4\_Fault\_UserException, seL4\_Fault\_DebugException,
337seL4\_Fault\_TimeoutFault, or seL4\_Fault\_NullFault (indicating no fault
338occured and this is a normal IPC message).
339
340Faults are delivered in such a way as to imitate a Call from the faulting
341thread. This means that to send a fault message the fault endpoint
342must have both write and grant permissions.
343
344\subsection{Capability Faults}
345
346Capability faults may occur in two places. Firstly, a capability fault
347can occur when lookup of a capability referenced by a
348\apifunc{seL4\_Call}{sel4_call} or \apifunc{seL4\_Send}{sel4_send} system call
349failed (\apifunc{seL4\_NBSend}{sel4_nbsend} calls on
350invalid capabilities silently fail). In this case, the capability
351on which the fault occurred may be the capability being invoked or an
352extra capability passed in the \texttt{caps} field in the IPC buffer.
353
354Secondly, a capability fault can occur when \apifunc{seL4\_Recv}{sel4_recv} or \apifunc{seL4\_NBRecv}{sel4_nbrecv}
355is called on a capability that does not exist, is not an endpoint or notification capability or does not have
356receive permissions.
357
358Replying to the fault IPC will restart the faulting thread. The contents of the
359IPC message are given in \autoref{tbl:ipc_contents}.\\
360
361\begin{table}[htb]
362\noindent\begin{tabularx}{\textwidth}{XX}
363\toprule
364    \textbf{Meaning} & \textbf{ IPC buffer location} \\
365\midrule
366    Address at which to restart execution & \ipcbloc{seL4\_CapFault\_IP} \\
367    Capability address & \ipcbloc{seL4\_CapFault\_Addr} \\
368In receive phase (1 if the fault happened during a receive system call, 0
369    otherwise) & \ipcbloc{seL4\_CapFault\_InRecvPhase} \\
370Lookup failure description. As described in \autoref{sec:lookup_fail_desc} &
371    \ipcbloc{seL4\_CapFault\_LookupFailureType} \\
372\bottomrule
373\end{tabularx}
374\caption{\label{tbl:ipc_contents}Contents of an IPC message.}
375\end{table}
376
377\subsection{Unknown Syscall}
378\label{sec:unknown-syscall}
379
380This fault occurs when a thread executes a system call with a syscall
381number that is unknown to seL4.
382The register set
383of the faulting thread is passed to the thread's exception handler so that it
384may, for example, emulate the system call if a thread is being
385virtualised.
386
387Replying to the fault IPC allows the thread to be restarted
388and/or the thread's register set to be modified. If the reply has
389a label of zero, the thread will be restarted. Additionally, if the
390message length is non-zero, the faulting thread's register set will be
391updated. In this case, the number of
392registers updated is controlled with the length field of the message
393tag.
394
395\subsection{User Exception}
396
397User exceptions are used to deliver architecture-defined exceptions. For
398example, such an exception could occur if a user thread attempted to
399divide a number by zero.
400
401Replying to the fault IPC allows the thread to be restarted
402and/or the thread's register set to be modified. If the reply has
403a label of zero, the thread will be restarted. Additionally, if the
404message length is non-zero, the faulting thread's register set will be
405updated. In this case, the number of
406registers updated is controlled with the length field of the message
407tag.
408
409\subsection{Debug Exception: Breakpoints and Watchpoints}
410\label{sec:debug_exceptions}
411
412Debug exceptions are used to deliver trace and debug related events to threads.
413Breakpoints, watchpoints, trace-events and instruction-performance sampling
414events are examples. These events are supported for userspace threads when the kernel
415is configured to include them (when CONFIG\_HARDWARE\_DEBUG\_API is set). The hardware
416debugging extensions API is supported on the following subset of the platforms that the
417kernel has been ported to:
418
419\begin{itemize}
420\item PC99: IA-32 and x86\_64
421\item Sabrelite (i.MX6)
422\item Jetson TegraK1
423\item HiSilicon Hikey
424\item Raspberry Pi 3
425\item Odroid-X (Exynos4)
426\item Xilinx zynq7000
427\end{itemize}
428
429Information on the available hardware debugging resources is presented in the form of the following constants:
430
431\begin{description}
432\item[seL4\_NumHWBreakpoints]: Defines the total number of hardware break
433registers available, of all types available on the hardware platform. On the ARM
434Cortex A7 for example, there are 6 exclusive instruction breakpoint registers,
435and 4 exclusive data watchpoint registers, for a total of 10 monitor registers.
436On this platform therefore, \texttt{seL4\_NumHWBreakpoints} is defined as 10.
437The instruction breakpoint registers will always be assigned the lower API-IDs,
438and the data watchpoints will always be assigned following them.
439
440Additionally, \texttt{seL4\_NumExclusiveBreakpoints}, \texttt{seL4\_NumExclusiveWatchpoints}
441and \texttt{seL4\_NumDualFunctionMonitors}
442are defined for each target platform to reflect the number of available
443hardware breakpoints/watchpoints of a certain type.
444
445\item[seL4\_NumExclusiveBreakpoints]: Defines the number of hardware registers
446capable of generating a fault \textbf{only} on instruction execution. Currently this will be
447set only on ARM platforms. The API-ID of the first exclusive breakpoint is given
448in \texttt{seL4\_FirstBreakpoint}. If there are no instruction-break exclusive
449registers, \texttt{seL4\_NumExclusiveBreakpoints} will be set to \texttt{0} and
450\texttt{seL4\_FirstBreakpoint} will be set to -1.
451
452\item[seL4\_NumExclusiveWatchpoints]: Defines the number of hardware registers
453capable of generating a fault \textbf{only} on data access. Currently this will be set only
454on ARM platforms. The API-ID of the first exclusive watchpoint is given
455in \texttt{seL4\_FirstWatchpoint}. If there are no data-break exclusive
456registers, \texttt{seL4\_NumExclusiveWatchpoints} will be set to \texttt{0} and
457\texttt{seL4\_FirstWatchpoint} will be set to -1.
458
459\item[seL4\_NumDualFunctionMonitors]: Defines the number of hardware registers
460capable of generating a fault on either type of access -- i.e, the register
461supports both instruction and data breaks. Currently this will be set only on
462x86 platforms. The API-ID of the first dual-function monitor is given
463in \texttt{seL4\_FirstDualFunctionMonitor}. If there are no dual-function break
464registers, \texttt{seL4\_NumDualFunctionMonitors} will be set to \texttt{0} and
465\texttt{seL4\_FirstDualFunctionMonitor} will be set to -1.
466
467\end{description}
468
469\begin{table}[h]
470\begin{tabularx}{\textwidth}{XXX}
471\toprule
472\textbf{Value sent} & \textbf{IPC buffer location} \\
473\midrule
474\reg{Breakpoint instruction address} & \ipcbloc{IPCBuffer[0]} \\
475\reg{Exception reason} & \ipcbloc{IPCBuffer[1]} \\
476\reg{Watchpoint data access address} & \ipcbloc{IPCBuffer[2]} \\
477\reg{Register API-ID} & \ipcbloc{IPCBuffer[3]} \\
478\bottomrule
479\end{tabularx}
480\caption{\label{tbl:debug_exception_result}Debug fault message layout. The
481register API-ID is not returned in the fault message from the kernel on
482single-step faults.}
483\end{table}
484
485\subsection{Debug Exception: Single-stepping}
486\label{sec:single_stepping_debug_exception}
487
488The kernel provides support for the use of hardware single-stepping of userspace
489threads when configured to do so (when CONFIG\_HARDWARE\_DEBUG\_API is set). To
490this end it exposes the invocation, \texttt{seL4\_TCB\_ConfigureSingleStepping}.
491
492The caller is expected to select an API-ID that corresponds to
493an instruction breakpoint, to use when setting up the single-stepping
494functionality (i.e, API-ID from 0 to \texttt{seL4\_NumExclusiveBreakpoints} - 1).
495However, not all hardware platforms require an actual hardware breakpoint
496register to provide single-stepping functionality. If the caller's hardware platform requires the
497use of a hardware breakpoint register, it will use the breakpoint register given to it in \texttt{bp\_num},
498and return \texttt{true} in \texttt{bp\_was\_consumed}. If the underlying platform does not need a hardware
499breakpoint to provide single-stepping, seL4 will return \texttt{false} in \texttt{bp\_was\_consumed} and
500leave \texttt{bp\_num} unchanged.
501
502If \texttt{bp\_was\_consumed} is \texttt{true}, the caller should not
503attempt to re-configure \texttt{bp\_num} for Breakpoint or Watchpoint usage until
504the caller has disabled single-stepping and released that register, via a subsequent
505call to \texttt{seL4\_TCB\_ConfigureSingleStepping}, or a fault-reply with
506\texttt{n\_instr} being 0. Setting \texttt{num\_instructions} to \texttt{0}
507\textbf{disables single stepping}.
508
509On architectures that require an actual hardware registers to be configured for
510single-stepping functionality, seL4 will restrict the number of registers that
511can be configured as single-steppers, to one at any given time. The register that
512is currently configured (if any) for single-stepping will be the implicit
513\texttt{bp\_num} argument in a single-step debug fault reply.
514
515The kernel's single-stepping, also supports skipping a certain number of
516instructions before delivering the single-step fault message. \texttt{Num\_instructions}
517should be set to \texttt{1} when single-stepping, or any non-zero integer value to skip that many
518instructions before resuming single-stepping. This skip-count can also be set in
519the fault-reply to a single-step debug fault.
520
521\begin{table}[h]
522\begin{tabularx}{\textwidth}{XXX}
523\toprule
524\textbf{Value sent} & \textbf{Register set by reply} & \textbf{IPC buffer location} \\
525\midrule
526\reg{Breakpoint instruction address} & \texttt{num\_instructions} to skip & \ipcbloc{IPCBuffer[0]} \\
527\reg{Exception reason} & --- & \ipcbloc{IPCBuffer[1]} \\
528\bottomrule
529\end{tabularx}
530\caption{\label{tbl:single_step_exception_result}Single-step fault message layout.}
531\end{table}
532
533\subsection{Timeout Fault}
534\label{sec:timeout-fault}
535
536Timeout faults are raised when a thread consumes all of its budget and has a
537timeout fault handler that is not a null capability.  They allow a timeout
538exception handler to take some action to restore the thread, and deliver a
539message containting the schedluing context data word, as well as the amount of
540time consumed since the last timeout fault occured on this scheduling context,
541or since \apifunc{seL4\_SchedContext\_YieldTo}{schedcontext_yieldto} or
542\apifunc{seL4\_SchedContext\_Consumed}{schedcontext_consumed} was last called.
543Timeout exception handlers can reply to a temporal fault with the registers set
544in the same format as in outlined in \autoref{sec:read_write_registers}.
545
546\begin{table}[htb] \noindent\begin{tabularx}{\textwidth}{XX} \toprule
547    \textbf{Meaning} & \textbf{IPC buffer location} \\ \midrule Data word from
548    the scheduling context object that the thread was running on when the fault
549    occured. & \ipcbloc{seL4\_TimeoutFault\_Data} \\ Upper 32-bits of
550    microseconds consumed since last reset &
551    \ipcbloc{seL4\_TimeoutFault\_Consumed} \\ Lower 32-bits of microseconds
552    consumed since last reset & \ipcbloc{seL4\_TimeoutFault\_Consumed\_LowBits}
553    \\ \bottomrule \end{tabularx}\\ \\ \caption{\label{tbl:tf_message_32}
554    Timeout fault outcome on 32-bit architectures.} \end{table}
555
556\subsection{VM Fault}
557\label{sec:vm-fault}
558
559The thread caused a page fault. Replying to the fault IPC will restart
560the thread. The contents of the IPC message are given below.\\
561
562\begin{table}[htb]
563\begin{tabularx}{\textwidth}{XXX}
564\toprule
565\textbf{Meaning} & \textbf{IPC buffer location} \\
566\midrule
567    Program counter to restart execution at. & \ipcbloc{seL4\_VMFault\_IP} \\
568Address that caused the fault. & \ipcbloc{seL4\_VMFault\_SP} \\
569    Instruction fault (1 if the fault was caused by an instruction fetch). & \ipcbloc{seL4\_VMFault\_PrefetchFault}  \\
570Fault status register (FSR). Contains information about the cause of the fault. Architecture dependent. & \ipcbloc{seL4\_VMFault\_FSR} \\
571\bottomrule
572\end{tabularx}
573\caption{\label{tbl:vm_fault_result_arm} VM Fault outcome on all architectures.}
574\end{table}
575
576\section{Domains}
577\label{sec:domains}
578
579Domains are used to isolate independent subsystems, so as to limit
580information flow between them.
581The kernel switches between domains according to a fixed, time-triggered
582schedule.
583The fixed schedule is compiled into the kernel via the constant
584\texttt{CONFIG\_NUM\_DOMAINS} and the global variable \texttt{ksDomSchedule}.
585
586A thread belongs to exactly one domain, and will only run when that domain
587is active.
588The \apifunc{seL4\_DomainSet\_Set}{domainset_set} method changes the domain
589of a thread.
590The caller must possess a \obj{Domain} cap and the thread's \obj{TCB} cap.
591The initial thread starts with a \obj{Domain} cap (see
592\autoref{sec:messageinfo}).
593
594\section{Virtualisation}
595\label{sec:virt}
596
597Hardware execution virtualisation is supported on specific arm and x86 platforms. The interface is exposed through a series
598of kernel objects, invocations and syscalls that allow the user to take advantage of hardware
599virtualisation features.
600
601Hardware virtualisation allows for a thread to perform instructions and operations as if it were
602running at a higher privilege level. As higher privilege levels typically have access to
603additional machine registers and other pieces of state a \obj{VCPU} object is introduced to act
604as storage for this state. For simplicity we refer to this virtualised higher privileged level as
605'guest mode'. \obj{VCPU}s are bound in a one-to-one relationship with a \obj{TCB} in order
606to provide a thread with this ability to run in higher privilege mode. See the section on
607ARM or x86 for more precise details.
608
609\obj{VCPU} objects also have additional, architecture specific, invocations for manipulating
610the additional state or other virtualisation controls provided by the hardware. Binding of
611a \obj{VCPU} to a \obj{TCB} is done by an invocation on the \obj{VCPU} only, and not the \obj{TCB}.
612
613The provided objects and invocations are, generally speaking, the thinnest possible shim over
614the underlying hardware primitives and operations. As a result an in depth familiarity with
615the underlying architecture specific harwdare mechanisms is required to use these objects, and
616such familiarity is therefore assumed in description.
617
618\subsection{ARM}
619
620When a \obj{TCB} has a bound \obj{VCPU} it is allowed to have the mode portion of the
621\texttt{cpsr} register set to values other than \texttt{user}. Specifically it may have any value other than
622\texttt{hypervisor}.
623
624TODO: this section needs more detail
625
626\subsection{x86}
627
628A \obj{TCB} with a bound \obj{VCPU} has two execution modes; one is the original thread just as
629if there was no bound \obj{VCPU}, and the other is the guest mode execution using the
630\obj{VCPU}. Switching from regular execution mode into the guest execution mode is
631done by using the \apifunc{seL4\_VMEnter}{sel4_vmenter} syscall. Executing this syscall causes the thread, whenever
632it is scheduled thereafter, to execute using the higher privlidged mode controlled by the \obj{VCPU}.
633Should the guest execution mode generate any kind of fault, or if a message arrives
634on the \obj{TCB}s bound notification, the \obj{TCB} will be switched back to regular mode
635and the \apifunc{seL4\_VMEnter}{sel4_vmenter} syscall will return with a message indicating the reason for return.
636
637\obj{VCPU} state and execution is controlled through the \apifunc{seL4\_VCPU\_ReadVMCS}{x86_vcpu_readvmcs}
638and \apifunc{seL4\_VCPU\_WriteVMCS}{x86_vcpu_writevmcs} invocations.
639These are very thin wrappers around the hardware \texttt{vmread} and \texttt{vmwrite} instructions and the kernel
640merely does enough validation on the parameters to ensure the \obj{VCPU} is not configured
641to run in such a way as to violate any kernel properties. For example, it is not possible to
642disable the use of External Interrupt Exiting, as this would prevent the kernel from receiving
643timer interrupts and allow the thread to monopolise CPU time.
644
645Memory access of the guest execution mode is controlled by requiring the use of Extended
646Page Tables (EPT). A series of EPT related paging structure objects (\obj{EPTPML4}, \obj{EPTPDPT}, \obj{EPTPD}, \obj{EPTPT})
647exist and are manipulated in exactly the same manner as the objects for the regular virtual
648address space. Once constructed a \obj{TCB} can be given an \obj{EPTPML4} as an EPT root with \apifunc{seL4\_TCB\_SetEPTRoot}{x86_set_eptroot},
649which serves as the vspace root when executing in guest mode, with the vspace root set
650with \apifunc{seL4\_TCB\_SetSPace}{tcb_setspace} or \apifunc{seL4\_TCB\_Configure}{tcb_configure}
651continuing to provide translation when the TCB is executing in its normal mode.
652
653Direct access to I/O ports can be given to the privlidged execution mode through the
654\apifunc{seL4\_X86\_VCPU\_EnableIOPort}{x86_vcpu_enableioport} invocation and allows the provided I/O port capability to be
655linked to the VCPU, and a subset of its I/O port range to be made accessible to the \obj{VCPU}.
656Linking means that an I/O port capability can only be used in a single \apifunc{seL4\_X86\_VCPU\_EnableIOPort}{x86_vcpu_enableioport}
657invocation and a second invocation will undo the previous one. The link also means that
658if the I/O port capability is deleted for any reason the access will be correspondingly removed
659from the \obj{VCPU}.
660