1%
2% Copyright 2014, General Dynamics C4 Systems
3%
4% This software may be distributed and modified according to the terms of
5% the GNU General Public License version 2. Note that NO WARRANTY is provided.
6% See "LICENSE_GPLv2.txt" for details.
7%
8% @TAG(GD_GPL)
9%
10
11\chapter{\label{ch:objects}Kernel Services and Objects}
12
13A limited number of service primitives are provided by the microkernel;
14more complex services may be implemented as applications on top of these
15primitives. In this way, the functionality of the system can be extended
16without increasing the code and complexity in privileged mode, while
17still supporting a potentially wide number of services for varied
18application domains.
19
20The basic services seL4 provides are as follows:
21\begin{description}
22    \item[Threads] are an abstraction of CPU execution that supports
23    running software;
24
25    \item[Address spaces] are virtual memory spaces that each contain an
26    application. Applications are limited to accessing memory in their
27    address space;
28
29    \item[Inter-process communication] (IPC) via \emph{endpoints} allows
30    threads to communicate using message passing;
31
32    \item[Notifications] provide a non-blocking signalling mechanism
33      similar to binary semaphores;
34
35    \item[Device primitives] allow device drivers to be implemented as
36    unprivileged applications.  The kernel exports hardware device
37    interrupts via IPC messages; and
38
39    \item[Capability spaces] store capabilities (i.e., access rights) to
40    kernel services along with their book-keeping information.
41\end{description}
42
43This chapter gives an overview of these services, describes how kernel
44objects are accessed by userspace applications, and describes how new
45objects can be created.
46
47\section{Capability-based Access Control}
48\label{sec:cap-access-control}
49
50The seL4 microkernel provides a capability-based access-control model.
51Access control governs all kernel services; in order to perform an
52operation, an application must \emph{invoke} a capability in its
53possession that has sufficient access rights for the requested service.
54With this, the system can be configured to isolate software components from
55each other, and also to enable authorised, controlled communication
56between components by selectively granting specific communication
57capabilities.  This enables software-component isolation with a high
58degree of assurance, as only those operations explicitly authorised by
59capability possession are permitted.
60
61A capability is an unforgeable token that references a specific kernel
62object (such as a thread control block) and carries access rights that
63control what methods may be invoked.
64Conceptually, a capability resides in an application's \emph{capability
65space}; an address in this space refers to a \emph{slot} which may or
66may not contain a capability.  An application may refer to
67a capability---to request a kernel service, for example---using the
68address of the slot holding that capability.  This means, the seL4 
69capability model is an instance of a \emph{segregated} (or \emph{partitioned})
70capability system, where capabilities are managed by the kernel.
71
72Capability spaces are implemented as a directed graph of kernel-managed
73\emph{capability nodes} (\obj{CNode}s).  A \obj{CNode} is a table of
74slots, where each slot may contain further \obj{CNode} capabilities. An
75address of a capability in a capability space is the concatenation of the indices
76of slots within \obj{CNodes} forming the path to the destination
77slot; we discuss \obj{CNode} objects in detail in \autoref{ch:cspace}.
78
79Capabilities can be copied and moved within capability spaces, and
80also sent via IPC. This allows creation of applications with specific
81access rights, the delegation of authority to another application, and
82passing to an application authority to a newly created (or selected)
83kernel service. Furthermore, capabilities can be \emph{minted} to
84create a derived capability with a subset of the rights of the
85original capability (never with more rights). A newly minted
86capability can be used for partial delegation of authority.
87
88Capabilities can also be revoked to withdraw
89authority. Revocation recursively removes any capabilities that have
90been derived from the original capability being revoked. The propagation of
91capabilities through the system is controlled by a
92\emph{take-grant}-based model~\cite{Elkaduwe_GE_08,Boyton_09}.
93
94\section{System Calls}
95\label{sec:syscalls}
96\label{sec:sys_send}
97\label{sec:sys_recv}
98\label{sec:sys_call}
99\label{sec:sys_reply}
100\label{sec:sys_nbsend}
101\label{sec:sys_replyrecv}
102\label{sec:sys_nbrecv}
103\label{sec:sys_yield}
104
105The seL4 kernel provides a message-passing service for communication between
106threads. This mechanism is also used for communication with kernel-provided
107services. There is a standard message format, each message containing a
108number of data words and possibly some capabilities. The structure and encoding
109of these messages are described in detail in \autoref{ch:ipc}. 
110
111Threads send messages by invoking capabilities within their capability space.
112When an endpoint capability is invoked in this way, the message will be
113transferred through the kernel to another thread. When capabilities to kernel
114objects are invoked, the message will be interpreted as a method invocation in a
115manner specific to the type of kernel object. For example, invoking a thread
116control block (TCB) capability with a correctly formatted message will suspend
117the target thread.
118
119Logically, the kernel provides three system calls, \emph{Send},
120\emph{Receive} and \emph{Yield}. However, there are also combinations
121and variants of the basic \emph{Send} and \emph{Receive} calls, e.g.\
122the \emph{Call} operation, which consists of a send followed by a
123\emph{Receive} from the same object. Methods on kernel objects other
124than endpoints and notifications are all mapped to \emph{Send} or
125\emph{Call}, depending on whether or not the method returns a
126result. The \emph{Yield} system call is not associated with any kernel
127object and is the only operation that does not invoke a capability.
128
129The complete set of system calls is:
130\begin{description}
131    \item[\apifunc{seL4\_Send}{sel4_send}] delivers a message through the named
132    capability and the application to continue. If the invoked
133    capability is an endpoint, and no receiver is ready to receive the message
134    immediately, the sending thread will block until the message can be
135    delivered. No error code or response will be returned by the receiving
136    object.
137
138    \item[\apifunc{seL4\_NBSend}{sel4_nbsend}] performs a polling send
139      on an endpoint. It
140    is similar to \apifunc{seL4\_Send}{sel4_send}, except that it is
141    guaranteed not to block. If the message
142    cannot be delivered immediately, i.e.\ there is no receiver waiting
143    on the destination \obj{Endpoint}, the message is silently dropped. Like
144    \apifunc{seL4\_Send}{sel4_send}, no error code or response will be returned.
145
146    \item[\apifunc{seL4\_Call}{sel4_call}] combines \apifunc{seL4\_Send}{sel4_send}
147      and \apifunc{seL4\_Recv}{sel4_recv}. The call
148      blocks the sending thread until its message is delivered and a reply message is received. When the
149    sent message is delivered to another thread (via an
150    \obj{Endpoint}), the kernel adds an
151    additional `\emph{reply}' capability to the message that is delivered
152    to the receiver, giving the latter the right to reply to the original sender. The reply
153    capability is deposited in a dedicated slot in the receiver's
154    \obj{TCB}, and is a single-use right, meaning that the kernel
155    invalidates it as soon as it has been invoked.
156
157    The \apifunc{seL4\_Call}{sel4_call} operation exists not only for
158    efficiency reasons (combining two operations into a single system
159    call). It differs from
160    \apifunc{seL4\_Send}{sel4_send} immediately followed by
161    \apifunc{seL4\_Recv}{sel4_recv} in two ways:
162    \begin{enumerate}
163    \item the single-use reply capability is created to establish a 
164      reply channel with minimal trust;
165    \item the transition from send to recv phase is atomic, meaning it
166      cannot be preempted, and the receiver can reply without any risk
167      of blocking.
168    \end{enumerate}
169
170    When invoking capabilities to kernel services, using
171    \apifunc{seL4\_Call}{sel4_call} allows the kernel to return an error code
172    or other response through the reply message.
173
174    \item[\apifunc{seL4\_Recv}{sel4_recv}] is used by a thread to receive
175    messages through endpoints or notifications. If no sender or
176    notification is pending, the caller
177    will block until a message or notification can be delivered. This system call works only on
178    \obj{Endpoint} or \obj{Notification} capabilities, raising a fault (see section \ref{sec:faults}) when
179    attempted with other capability types.
180
181    \item[\apifunc{seL4\_Reply}{sel4_reply}] is used to respond to a
182    \apifunc{seL4\_Call}{sel4_call}, using the reply capability generated by the
183    \apifunc{seL4\_Call}{sel4_call} system call and stored in the replying
184    thread's TCB. It delivers the message to the thread that invoked
185    the \apifunc{seL4\_Call}{sel4_call}, waking it in
186    the process.
187
188    There is space for only one reply capability in each thread's TCB, so the
189    \apifunc{seL4\_Reply}{sel4_reply} syscall can be used to reply to the most
190    recent caller only. The \apifunc{seL4\_CNode\_SaveCaller}{cnode_savecaller}
191    method that will be described later can be used to save the reply
192    capability into regular capability space, where it can be used with
193    \apifunc{seL4\_Send}{sel4_send}.
194
195    \item[\apifunc{seL4\_ReplyRecv}{sel4_replyrecv}] combines \apifunc{seL4\_Reply}{sel4_reply} and
196    \apifunc{seL4\_Recv}{sel4_recv}. It exists mostly for efficiency reasons: the common case of
197    replying to a request and waiting for the next can be performed in
198    a single kernel system call instead of two. The transition from
199    the reply to the receive phase is also atomic.
200
201    \item[\apifunc{seL4\_NBRecv}{sel4_nbrecv}] is used by a thread to check for
202    signals pending on a notification object or messages pending on an endpoint without blocking.
203    This system call works only on endpoints and notification object 
204    capabilities, raising a fault (see section \ref{sec:faults}) when attempted
205    with other capability types.
206
207    \item[\apifunc{seL4\_Yield}{sel4_yield}] is the only system call that does not require
208    a capability to be used. It forfeits the remainder of the calling thread's
209    timeslice and causes invocation of the kernel's scheduler.
210    If there are no other runnable threads with the same
211    priority as the caller, the calling thread will immediately be
212    scheduled with a fresh timeslice.
213\end{description}
214
215\section{Kernel Objects}
216\label{s:sel4_internals}
217
218In this section we give a brief overview of the kernel-implemented
219object types whose instances (also simply called \emph{objects}) can be invoked by applications. The interface to these
220objects forms the interface to the kernel itself. The creation and use
221of kernel services is achieved by the creation,
222manipulation, and combination of these kernel objects:
223
224\begin{description}
225
226    \item[\obj{CNodes}] (see \autoref{ch:cspace}) store capabilities, giving threads permission to
227    invoke methods on particular objects.
228    Each \obj{CNode} has a fixed number of slots,
229    always a power of two, determined when the \obj{CNode} is created. Slots
230    can be empty or contain a capability.
231
232    \item[Thread Control Blocks] (\obj{TCB}s; see \autoref{ch:threads}) represent a thread of
233    execution in seL4. Threads are the unit of execution that is
234    scheduled, blocked, unblocked, etc., depending on the application's
235    interaction with other threads.
236
237    \item[\obj{Endpoints}] (see \autoref{ch:ipc}) facilitate message-passing
238    communication between threads. IPC is synchronous: A thread
239    trying to send or receive on an endpoint blocks until the message
240    can be delivered. This means that message delivery only happens if
241    a sender and a receiver rendezvous at the endpoint, and the
242    kernel can deliver the message with a single copy (or without
243    copying for short messages using only registers).
244
245    A capability to an endpoint can be restricted to be
246    send-only or receive-only. Additionally, \obj{Endpoint}
247    capabilities can have the grant right, which allows sending
248    capabilities as part of the message.
249
250    \item[\obj{Notification} Objects] (see \autoref{ch:notifications})
251      provide a simple signalling mechanism. A \obj{Notification}
252      is a word-size array of flags, each of which behaves like a binary semaphore. Operations
253      are \emph{signalling} a subset of flags in a single operation,
254      polling to check any flags, 
255      and blocking until any are signalled. Notification capabilities
256      can be signal-only or wait-only.
257
258    \item[Virtual Address Space Objects] (see \autoref{ch:vspace}) 
259    are used to construct a virtual
260    address space (or VSpace) for one or more threads. These
261    objects largely directly correspond to those of the hardware, and
262    as such are architecture-dependent. The kernel also includes \obj{ASID
263    Pool} and \obj{ASID Control} objects for tracking the status of
264    address spaces.
265
266    \item[Interrupt Objects] (see \autoref{ch:io}) give applications the ability to receive
267    and acknowledge interrupts from hardware devices.
268    Initially, there is a capability to \obj{IRQControl},
269    which allows for the creation of \obj{IRQHandler} capabilities.
270    An \obj{IRQHandler} capability permits the management of a specific 
271    interrupt source associated with a specific device.
272    It is delegated to
273    a device driver to access an interrupt source. The \obj{IRQHandler}
274    object allows threads to wait for and acknowledge individual
275    interrupts.
276
277    \item[Untyped Memory] (see \autoref{sec:kernmemalloc}) is the foundation of memory allocation
278    in the seL4 kernel.  Untyped memory capabilities have a single method
279    which allows the creation of new kernel objects. If the method
280    succeeds, the calling thread gains access to capabilities to the
281    newly-created objects. Additionally, untyped memory objects can be
282    divided into a group of smaller untyped memory objects allowing
283    delegation of part (or all) of the system's memory.  We discuss
284    memory management in general in the following sections.
285
286\end{description}
287
288\section{Kernel Memory Allocation}
289\label{sec:kernmemalloc}
290
291The seL4 microkernel does not dynamically allocate memory for kernel objects.
292Instead, objects must be explicitly created from application-controlled memory
293regions via \obj{Untyped Memory} capabilities.  Applications must have
294explicit authority to memory (through these \obj{Untyped Memory} capabilities) in
295order to create new objects, and all objects consume a fixed amount of memory once
296created. These mechanisms can be used to precisely control
297the specific amount of physical memory available to applications,
298including being able to enforce isolation of physical memory access
299between applications or a device.  There are no arbitrary resource
300limits in the kernel apart from those dictated by the
301hardware\footnote{The treatment of virtual ASIDs imposes a fixed number
302of address spaces. This limitation is to be removed in future
303versions of seL4.}, and so many denial-of-service attacks via resource
304exhaustion are avoided.
305
306At boot time, seL4 pre-allocates the memory required for the kernel
307itself, including the code, data, and stack sections (seL4 is a single
308kernel-stack operating system). It then creates an initial user
309thread (with an appropriate address and capability space).
310The kernel then hands all remaining memory to
311the initial thread in the form of capabilities to \obj{Untyped Memory}, and
312some additional capabilities to kernel objects that were required to
313bootstrap the initial thread.  These \obj{Untyped Memory} regions can then be split into
314smaller regions or other kernel objects using the
315\apifunc{seL4\_Untyped\_Retype}{untyped_retype} method; the created objects are termed \emph{children} of
316the original untyped memory object.
317
318The user-level application that creates an object using \apifunc{seL4\_Untyped\_Retype}{untyped_retype}
319receives full authority over the resulting object. It can then delegate
320all or part of the authority it possesses over this object to one or
321more of its clients.
322
323Untyped memory objects represent two different types of memory:
324general purpose memory, or device memory.
325\emph{General purpose} memory can be untyped into any other object
326type and used for any operation on untyped memory provided by the kernel.
327\emph{Device memory} covers memory regions reserved for devices
328as determined by the hardware platform, and usage of these objects
329is restricted by the kernel in the following ways:
330
331\begin{itemize}
332\item Device untyped objects can only be retyped into frames or other
333untyped objects; developers cannot, for example, create an endpoint from device memory.
334\item Frame objects retyped from device untyped objects cannot be set as thread IPC buffers, or used
335in the creation of an ASID pool
336\end{itemize}
337
338The type attribute (whether it represents \emph{general purpose} or
339\emph{device} memory) of a child untyped object is inherited from its
340parent untyped object. That is, any child of a device untyped will
341also be a device untyped. Developers cannot change the type attribute of an untyped.
342
343\subsection{Reusing Memory}
344\label{s:memRevoke}
345
346The model described thus far is sufficient for applications to
347allocate kernel objects, distribute authority among client
348applications, and obtain various kernel services provided by these
349objects.  This alone is sufficient for a simple static system
350configuration.
351
352The seL4 kernel also allows \obj{Untyped Memory} regions to be reused.
353Reusing a region of memory is allowed only
354when there are no dangling references (i.e., capabilities) left to the
355objects inside that memory.  The kernel tracks
356\emph{capability derivations}, i.e., the children generated by the
357methods \apifunc{seL4\_Untyped\_Retype}{untyped_retype}, \apifunc{seL4\_CNode\_Mint}{cnode_mint}, \apifunc{seL4\_CNode\_Copy}{cnode_copy}, and
358\apifunc{seL4\_CNode\_Mutate}{cnode_mutate}.
359
360The tree structure so generated is termed the \emph{capability
361derivation tree} (CDT).\footnote{Although the CDT conceptually is a separate
362data structure, it is implemented as part of the CNode object and so
363requires no additional kernel meta-data.}  For example, when a user
364creates new kernel objects by retyping untyped memory, the newly created
365capabilities would be inserted into the CDT as children of the untyped
366memory capability.
367
368For each \obj{Untyped Memory} region, the kernel keeps
369a \emph{watermark} recording how much of the region has previously been
370allocated. Whenever a user requests the kernel to create new objects in
371an untyped memory region, the kernel will carry out one of two actions:
372if there are already existing objects allocated in the region, the
373kernel will allocate the new objects at the current watermark level, and
374increase the watermark. If all objects previously allocated in the
375region have been deleted, the kernel will reset the watermark and start
376allocating new objects from the beginning of the region again.
377
378Finally, the \apifunc{seL4\_CNode\_Revoke}{cnode_revoke} method provided by \obj{CNode} objects
379destroys all capabilities derived from the argument capability. Revoking
380the last capability to a kernel object triggers the \emph{destroy}
381operation on the now unreferenced object. This simply cleans up any in-kernel dependencies between
382it, other objects and the kernel.
383
384By calling \apifunc{seL4\_CNode\_Revoke}{cnode_revoke} on the original capability to an untyped memory
385object, the user removes all of the untyped memory object's
386children---that is, all capabilities pointing to objects in the untyped
387memory region.  Thus, after this invocation there are no valid references
388to any object within the untyped region, and the region may be safely
389retyped and reused.
390
391\subsection{Summary of Object Sizes}
392\label{sec:object_sizes}
393
394When retyping untyped memory it is useful to know how much memory the
395object will require. Object sizes are defined in libsel4.
396
397Note that \obj{CNode}s and \obj{Untyped Object}s have variables sizes.
398When retyping untyped memory into \obj{CNode}s or breaking an
399\obj{Untyped Object} into smaller \obj{Untyped Object}s, the
400\texttt{size\_bits} argument to
401\apifunc{seL4\_Untyped\_Retype}{untyped_retype} is used to specify
402the size of the resulting objects.
403For all other object types, the size is fixed, and the \texttt{size\_bits}
404argument to \apifunc{seL4\_Untyped\_Retype}{untyped_retype} is ignored.
405
406A single call to \apifunc{seL4\_Untyped\_Retype}{untyped_retype} can retype a
407single \obj{Untyped Object} into multiple objects. The number of objects
408to create is specified by its \texttt{num\_objects} argument. All created
409objects must be of the same type, specified by the \texttt{type} argument. In
410the case of variable-sized objects, each object must also be of the same size.
411If the size of the memory area needed (calculated by the object size multiplied
412by \texttt{num\_objects}) is greater than the remaining unallocated memory of
413the \obj{Untyped Object}, an error will result.
414