1% 2% Copyright 2014, General Dynamics C4 Systems 3% 4% This software may be distributed and modified according to the terms of 5% the GNU General Public License version 2. Note that NO WARRANTY is provided. 6% See "LICENSE_GPLv2.txt" for details. 7% 8% @TAG(GD_GPL) 9% 10 11\chapter{\label{ch:objects}Kernel Services and Objects} 12 13A limited number of service primitives are provided by the microkernel; 14more complex services may be implemented as applications on top of these 15primitives. In this way, the functionality of the system can be extended 16without increasing the code and complexity in privileged mode, while 17still supporting a potentially wide number of services for varied 18application domains. 19 20The basic services seL4 provides are as follows: 21\begin{description} 22 \item[Threads] are an abstraction of CPU execution that supports 23 running software; 24 25 \item[Address spaces] are virtual memory spaces that each contain an 26 application. Applications are limited to accessing memory in their 27 address space; 28 29 \item[Inter-process communication] (IPC) via \emph{endpoints} allows 30 threads to communicate using message passing; 31 32 \item[Notifications] provide a non-blocking signalling mechanism 33 similar to binary semaphores; 34 35 \item[Device primitives] allow device drivers to be implemented as 36 unprivileged applications. The kernel exports hardware device 37 interrupts via IPC messages; and 38 39 \item[Capability spaces] store capabilities (i.e., access rights) to 40 kernel services along with their book-keeping information. 41\end{description} 42 43This chapter gives an overview of these services, describes how kernel 44objects are accessed by userspace applications, and describes how new 45objects can be created. 46 47\section{Capability-based Access Control} 48\label{sec:cap-access-control} 49 50The seL4 microkernel provides a capability-based access-control model. 51Access control governs all kernel services; in order to perform an 52operation, an application must \emph{invoke} a capability in its 53possession that has sufficient access rights for the requested service. 54With this, the system can be configured to isolate software components from 55each other, and also to enable authorised, controlled communication 56between components by selectively granting specific communication 57capabilities. This enables software-component isolation with a high 58degree of assurance, as only those operations explicitly authorised by 59capability possession are permitted. 60 61A capability is an unforgeable token that references a specific kernel 62object (such as a thread control block) and carries access rights that 63control what methods may be invoked. 64Conceptually, a capability resides in an application's \emph{capability 65space}; an address in this space refers to a \emph{slot} which may or 66may not contain a capability. An application may refer to 67a capability---to request a kernel service, for example---using the 68address of the slot holding that capability. This means, the seL4 69capability model is an instance of a \emph{segregated} (or \emph{partitioned}) 70capability system, where capabilities are managed by the kernel. 71 72Capability spaces are implemented as a directed graph of kernel-managed 73\emph{capability nodes} (\obj{CNode}s). A \obj{CNode} is a table of 74slots, where each slot may contain further \obj{CNode} capabilities. An 75address of a capability in a capability space is the concatenation of the indices 76of slots within \obj{CNodes} forming the path to the destination 77slot; we discuss \obj{CNode} objects in detail in \autoref{ch:cspace}. 78 79Capabilities can be copied and moved within capability spaces, and 80also sent via IPC. This allows creation of applications with specific 81access rights, the delegation of authority to another application, and 82passing to an application authority to a newly created (or selected) 83kernel service. Furthermore, capabilities can be \emph{minted} to 84create a derived capability with a subset of the rights of the 85original capability (never with more rights). A newly minted 86capability can be used for partial delegation of authority. 87 88Capabilities can also be revoked to withdraw 89authority. Revocation recursively removes any capabilities that have 90been derived from the original capability being revoked. The propagation of 91capabilities through the system is controlled by a 92\emph{take-grant}-based model~\cite{Elkaduwe_GE_08,Boyton_09}. 93 94\section{System Calls} 95\label{sec:syscalls} 96\label{sec:sys_send} 97\label{sec:sys_recv} 98\label{sec:sys_call} 99\label{sec:sys_reply} 100\label{sec:sys_nbsend} 101\label{sec:sys_replyrecv} 102\label{sec:sys_nbrecv} 103\label{sec:sys_yield} 104 105The seL4 kernel provides a message-passing service for communication between 106threads. This mechanism is also used for communication with kernel-provided 107services. There is a standard message format, each message containing a 108number of data words and possibly some capabilities. The structure and encoding 109of these messages are described in detail in \autoref{ch:ipc}. 110 111Threads send messages by invoking capabilities within their capability space. 112When an endpoint capability is invoked in this way, the message will be 113transferred through the kernel to another thread. When capabilities to kernel 114objects are invoked, the message will be interpreted as a method invocation in a 115manner specific to the type of kernel object. For example, invoking a thread 116control block (TCB) capability with a correctly formatted message will suspend 117the target thread. 118 119Logically, the kernel provides three system calls, \emph{Send}, 120\emph{Receive} and \emph{Yield}. However, there are also combinations 121and variants of the basic \emph{Send} and \emph{Receive} calls, e.g.\ 122the \emph{Call} operation, which consists of a send followed by a 123\emph{Receive} from the same object. Methods on kernel objects other 124than endpoints and notifications are all mapped to \emph{Send} or 125\emph{Call}, depending on whether or not the method returns a 126result. The \emph{Yield} system call is not associated with any kernel 127object and is the only operation that does not invoke a capability. 128 129The complete set of system calls is: 130\begin{description} 131 \item[\apifunc{seL4\_Send}{sel4_send}] delivers a message through the named 132 capability and the application to continue. If the invoked 133 capability is an endpoint, and no receiver is ready to receive the message 134 immediately, the sending thread will block until the message can be 135 delivered. No error code or response will be returned by the receiving 136 object. 137 138 \item[\apifunc{seL4\_NBSend}{sel4_nbsend}] performs a polling send 139 on an endpoint. It 140 is similar to \apifunc{seL4\_Send}{sel4_send}, except that it is 141 guaranteed not to block. If the message 142 cannot be delivered immediately, i.e.\ there is no receiver waiting 143 on the destination \obj{Endpoint}, the message is silently dropped. Like 144 \apifunc{seL4\_Send}{sel4_send}, no error code or response will be returned. 145 146 \item[\apifunc{seL4\_Call}{sel4_call}] combines \apifunc{seL4\_Send}{sel4_send} 147 and \apifunc{seL4\_Recv}{sel4_recv}. The call 148 blocks the sending thread until its message is delivered and a reply message is received. When the 149 sent message is delivered to another thread (via an 150 \obj{Endpoint}), the kernel adds an 151 additional `\emph{reply}' capability to the message that is delivered 152 to the receiver, giving the latter the right to reply to the original sender. The reply 153 capability is deposited in a dedicated slot in the receiver's 154 \obj{TCB}, and is a single-use right, meaning that the kernel 155 invalidates it as soon as it has been invoked. 156 157 The \apifunc{seL4\_Call}{sel4_call} operation exists not only for 158 efficiency reasons (combining two operations into a single system 159 call). It differs from 160 \apifunc{seL4\_Send}{sel4_send} immediately followed by 161 \apifunc{seL4\_Recv}{sel4_recv} in two ways: 162 \begin{enumerate} 163 \item the single-use reply capability is created to establish a 164 reply channel with minimal trust; 165 \item the transition from send to recv phase is atomic, meaning it 166 cannot be preempted, and the receiver can reply without any risk 167 of blocking. 168 \end{enumerate} 169 170 When invoking capabilities to kernel services, using 171 \apifunc{seL4\_Call}{sel4_call} allows the kernel to return an error code 172 or other response through the reply message. 173 174 \item[\apifunc{seL4\_Recv}{sel4_recv}] is used by a thread to receive 175 messages through endpoints or notifications. If no sender or 176 notification is pending, the caller 177 will block until a message or notification can be delivered. This system call works only on 178 \obj{Endpoint} or \obj{Notification} capabilities, raising a fault (see section \ref{sec:faults}) when 179 attempted with other capability types. 180 181 \item[\apifunc{seL4\_Reply}{sel4_reply}] is used to respond to a 182 \apifunc{seL4\_Call}{sel4_call}, using the reply capability generated by the 183 \apifunc{seL4\_Call}{sel4_call} system call and stored in the replying 184 thread's TCB. It delivers the message to the thread that invoked 185 the \apifunc{seL4\_Call}{sel4_call}, waking it in 186 the process. 187 188 There is space for only one reply capability in each thread's TCB, so the 189 \apifunc{seL4\_Reply}{sel4_reply} syscall can be used to reply to the most 190 recent caller only. The \apifunc{seL4\_CNode\_SaveCaller}{cnode_savecaller} 191 method that will be described later can be used to save the reply 192 capability into regular capability space, where it can be used with 193 \apifunc{seL4\_Send}{sel4_send}. 194 195 \item[\apifunc{seL4\_ReplyRecv}{sel4_replyrecv}] combines \apifunc{seL4\_Reply}{sel4_reply} and 196 \apifunc{seL4\_Recv}{sel4_recv}. It exists mostly for efficiency reasons: the common case of 197 replying to a request and waiting for the next can be performed in 198 a single kernel system call instead of two. The transition from 199 the reply to the receive phase is also atomic. 200 201 \item[\apifunc{seL4\_NBRecv}{sel4_nbrecv}] is used by a thread to check for 202 signals pending on a notification object or messages pending on an endpoint without blocking. 203 This system call works only on endpoints and notification object 204 capabilities, raising a fault (see section \ref{sec:faults}) when attempted 205 with other capability types. 206 207 \item[\apifunc{seL4\_Yield}{sel4_yield}] is the only system call that does not require 208 a capability to be used. It forfeits the remainder of the calling thread's 209 timeslice and causes invocation of the kernel's scheduler. 210 If there are no other runnable threads with the same 211 priority as the caller, the calling thread will immediately be 212 scheduled with a fresh timeslice. 213\end{description} 214 215\section{Kernel Objects} 216\label{s:sel4_internals} 217 218In this section we give a brief overview of the kernel-implemented 219object types whose instances (also simply called \emph{objects}) can be invoked by applications. The interface to these 220objects forms the interface to the kernel itself. The creation and use 221of kernel services is achieved by the creation, 222manipulation, and combination of these kernel objects: 223 224\begin{description} 225 226 \item[\obj{CNodes}] (see \autoref{ch:cspace}) store capabilities, giving threads permission to 227 invoke methods on particular objects. 228 Each \obj{CNode} has a fixed number of slots, 229 always a power of two, determined when the \obj{CNode} is created. Slots 230 can be empty or contain a capability. 231 232 \item[Thread Control Blocks] (\obj{TCB}s; see \autoref{ch:threads}) represent a thread of 233 execution in seL4. Threads are the unit of execution that is 234 scheduled, blocked, unblocked, etc., depending on the application's 235 interaction with other threads. 236 237 \item[\obj{Endpoints}] (see \autoref{ch:ipc}) facilitate message-passing 238 communication between threads. IPC is synchronous: A thread 239 trying to send or receive on an endpoint blocks until the message 240 can be delivered. This means that message delivery only happens if 241 a sender and a receiver rendezvous at the endpoint, and the 242 kernel can deliver the message with a single copy (or without 243 copying for short messages using only registers). 244 245 A capability to an endpoint can be restricted to be 246 send-only or receive-only. Additionally, \obj{Endpoint} 247 capabilities can have the grant right, which allows sending 248 capabilities as part of the message. 249 250 \item[\obj{Notification} Objects] (see \autoref{ch:notifications}) 251 provide a simple signalling mechanism. A \obj{Notification} 252 is a word-size array of flags, each of which behaves like a binary semaphore. Operations 253 are \emph{signalling} a subset of flags in a single operation, 254 polling to check any flags, 255 and blocking until any are signalled. Notification capabilities 256 can be signal-only or wait-only. 257 258 \item[Virtual Address Space Objects] (see \autoref{ch:vspace}) 259 are used to construct a virtual 260 address space (or VSpace) for one or more threads. These 261 objects largely directly correspond to those of the hardware, and 262 as such are architecture-dependent. The kernel also includes \obj{ASID 263 Pool} and \obj{ASID Control} objects for tracking the status of 264 address spaces. 265 266 \item[Interrupt Objects] (see \autoref{ch:io}) give applications the ability to receive 267 and acknowledge interrupts from hardware devices. 268 Initially, there is a capability to \obj{IRQControl}, 269 which allows for the creation of \obj{IRQHandler} capabilities. 270 An \obj{IRQHandler} capability permits the management of a specific 271 interrupt source associated with a specific device. 272 It is delegated to 273 a device driver to access an interrupt source. The \obj{IRQHandler} 274 object allows threads to wait for and acknowledge individual 275 interrupts. 276 277 \item[Untyped Memory] (see \autoref{sec:kernmemalloc}) is the foundation of memory allocation 278 in the seL4 kernel. Untyped memory capabilities have a single method 279 which allows the creation of new kernel objects. If the method 280 succeeds, the calling thread gains access to capabilities to the 281 newly-created objects. Additionally, untyped memory objects can be 282 divided into a group of smaller untyped memory objects allowing 283 delegation of part (or all) of the system's memory. We discuss 284 memory management in general in the following sections. 285 286\end{description} 287 288\section{Kernel Memory Allocation} 289\label{sec:kernmemalloc} 290 291The seL4 microkernel does not dynamically allocate memory for kernel objects. 292Instead, objects must be explicitly created from application-controlled memory 293regions via \obj{Untyped Memory} capabilities. Applications must have 294explicit authority to memory (through these \obj{Untyped Memory} capabilities) in 295order to create new objects, and all objects consume a fixed amount of memory once 296created. These mechanisms can be used to precisely control 297the specific amount of physical memory available to applications, 298including being able to enforce isolation of physical memory access 299between applications or a device. There are no arbitrary resource 300limits in the kernel apart from those dictated by the 301hardware\footnote{The treatment of virtual ASIDs imposes a fixed number 302of address spaces. This limitation is to be removed in future 303versions of seL4.}, and so many denial-of-service attacks via resource 304exhaustion are avoided. 305 306At boot time, seL4 pre-allocates the memory required for the kernel 307itself, including the code, data, and stack sections (seL4 is a single 308kernel-stack operating system). It then creates an initial user 309thread (with an appropriate address and capability space). 310The kernel then hands all remaining memory to 311the initial thread in the form of capabilities to \obj{Untyped Memory}, and 312some additional capabilities to kernel objects that were required to 313bootstrap the initial thread. These \obj{Untyped Memory} regions can then be split into 314smaller regions or other kernel objects using the 315\apifunc{seL4\_Untyped\_Retype}{untyped_retype} method; the created objects are termed \emph{children} of 316the original untyped memory object. 317 318The user-level application that creates an object using \apifunc{seL4\_Untyped\_Retype}{untyped_retype} 319receives full authority over the resulting object. It can then delegate 320all or part of the authority it possesses over this object to one or 321more of its clients. 322 323Untyped memory objects represent two different types of memory: 324general purpose memory, or device memory. 325\emph{General purpose} memory can be untyped into any other object 326type and used for any operation on untyped memory provided by the kernel. 327\emph{Device memory} covers memory regions reserved for devices 328as determined by the hardware platform, and usage of these objects 329is restricted by the kernel in the following ways: 330 331\begin{itemize} 332\item Device untyped objects can only be retyped into frames or other 333untyped objects; developers cannot, for example, create an endpoint from device memory. 334\item Frame objects retyped from device untyped objects cannot be set as thread IPC buffers, or used 335in the creation of an ASID pool 336\end{itemize} 337 338The type attribute (whether it represents \emph{general purpose} or 339\emph{device} memory) of a child untyped object is inherited from its 340parent untyped object. That is, any child of a device untyped will 341also be a device untyped. Developers cannot change the type attribute of an untyped. 342 343\subsection{Reusing Memory} 344\label{s:memRevoke} 345 346The model described thus far is sufficient for applications to 347allocate kernel objects, distribute authority among client 348applications, and obtain various kernel services provided by these 349objects. This alone is sufficient for a simple static system 350configuration. 351 352The seL4 kernel also allows \obj{Untyped Memory} regions to be reused. 353Reusing a region of memory is allowed only 354when there are no dangling references (i.e., capabilities) left to the 355objects inside that memory. The kernel tracks 356\emph{capability derivations}, i.e., the children generated by the 357methods \apifunc{seL4\_Untyped\_Retype}{untyped_retype}, \apifunc{seL4\_CNode\_Mint}{cnode_mint}, \apifunc{seL4\_CNode\_Copy}{cnode_copy}, and 358\apifunc{seL4\_CNode\_Mutate}{cnode_mutate}. 359 360The tree structure so generated is termed the \emph{capability 361derivation tree} (CDT).\footnote{Although the CDT conceptually is a separate 362data structure, it is implemented as part of the CNode object and so 363requires no additional kernel meta-data.} For example, when a user 364creates new kernel objects by retyping untyped memory, the newly created 365capabilities would be inserted into the CDT as children of the untyped 366memory capability. 367 368For each \obj{Untyped Memory} region, the kernel keeps 369a \emph{watermark} recording how much of the region has previously been 370allocated. Whenever a user requests the kernel to create new objects in 371an untyped memory region, the kernel will carry out one of two actions: 372if there are already existing objects allocated in the region, the 373kernel will allocate the new objects at the current watermark level, and 374increase the watermark. If all objects previously allocated in the 375region have been deleted, the kernel will reset the watermark and start 376allocating new objects from the beginning of the region again. 377 378Finally, the \apifunc{seL4\_CNode\_Revoke}{cnode_revoke} method provided by \obj{CNode} objects 379destroys all capabilities derived from the argument capability. Revoking 380the last capability to a kernel object triggers the \emph{destroy} 381operation on the now unreferenced object. This simply cleans up any in-kernel dependencies between 382it, other objects and the kernel. 383 384By calling \apifunc{seL4\_CNode\_Revoke}{cnode_revoke} on the original capability to an untyped memory 385object, the user removes all of the untyped memory object's 386children---that is, all capabilities pointing to objects in the untyped 387memory region. Thus, after this invocation there are no valid references 388to any object within the untyped region, and the region may be safely 389retyped and reused. 390 391\subsection{Summary of Object Sizes} 392\label{sec:object_sizes} 393 394When retyping untyped memory it is useful to know how much memory the 395object will require. Object sizes are defined in libsel4. 396 397Note that \obj{CNode}s and \obj{Untyped Object}s have variables sizes. 398When retyping untyped memory into \obj{CNode}s or breaking an 399\obj{Untyped Object} into smaller \obj{Untyped Object}s, the 400\texttt{size\_bits} argument to 401\apifunc{seL4\_Untyped\_Retype}{untyped_retype} is used to specify 402the size of the resulting objects. 403For all other object types, the size is fixed, and the \texttt{size\_bits} 404argument to \apifunc{seL4\_Untyped\_Retype}{untyped_retype} is ignored. 405 406A single call to \apifunc{seL4\_Untyped\_Retype}{untyped_retype} can retype a 407single \obj{Untyped Object} into multiple objects. The number of objects 408to create is specified by its \texttt{num\_objects} argument. All created 409objects must be of the same type, specified by the \texttt{type} argument. In 410the case of variable-sized objects, each object must also be of the same size. 411If the size of the memory area needed (calculated by the object size multiplied 412by \texttt{num\_objects}) is greater than the remaining unallocated memory of 413the \obj{Untyped Object}, an error will result. 414