1%
2% Copyright 2014, General Dynamics C4 Systems
3%
4% SPDX-License-Identifier: GPL-2.0-only
5%
6
7\chapter{\label{ch:ipc}Message Passing (IPC)}
8
9The seL4 microkernel provides a message-passing IPC mechanism for communication
10between threads. The same mechanism is also used for communication with
11kernel-provided services. Messages are sent by invoking a capability to a
12kernel object. Messages sent to \obj{Endpoint}s are destined for other
13threads, while messages sent to other objects are processed by the kernel. This
14chapter describes the common message format, endpoints,
15and how they can be used for communication between applications.
16
17\section{Message Registers}
18\label{sec:messageinfo}
19
20Each message contains a number of message words and optionally a number of
21capabilities.
22The message words are sent to or received from a thread by placing them in its \emph{message registers}.
23The message registers are numbered and the first few message registers are implemented
24using physical CPU registers, while the rest are backed by a fixed region of
25memory called the \emph{IPC buffer}.
26The reason for this design is efficiency:
27very short messages need not use the memory.
28The IPC buffer is assigned to the calling thread (see \autoref{sec:threads} and \autoref{api:tcb_setipcbuffer}).
29%FIXME: seL4_TCB_SetIPCBuffer is only mentioned in the API reference!
30
31Every IPC message also has a tag (structure \texttt{seL4\_MessageInfo\_t}).  The
32tag consists of four fields: the label, message length, number of capabilities
33(the \texttt{extraCaps} field) and the \texttt{capsUnwrapped} field.  The
34message length and number of capabilities determine either the number of
35message registers and capabilities that the sending thread wishes to transfer,
36or the number of message registers and capabilities that were actually
37transferred. The label is not interpreted by the
38kernel and is passed unmodified as the first data payload of the message. The
39label may, for example, be used to specify a requested operation. The
40\texttt{capsUnwrapped} field is used only on the receive side, to indicate the
41manner in which capabilities were received. It is described in
42\autoref{sec:cap-transfer}.
43
44% FIXME: a little too low-level, perhaps?
45
46\newcommand{\ipcparam}[4]{\texttt{#1} \emph{#2}&\texttt{#3}&#4\\ }
47\begin{table}[htb]
48    \begin{center}
49    \begin{tabularx}{\textwidth}{p{0.28\textwidth}p{0.18\textwidth}X}
50      \toprule
51      \textbf{Type} & \textbf{Name} & \textbf{Description} \\
52      \midrule
53        \ipcparam{seL4\_MessageInfo\_t}{}{tag}{Message tag}
54        \ipcparam{seL4\_Word[]}{}{msg}{Message contents}
55        \ipcparam{seL4\_Word}{}{userData}{Base address of the structure, used by
56        supporting user libraries}
57        \ipcparam{seL4\_CPtr[]}{(in)}{caps}{Capabilities to transfer}
58        \ipcparam{seL4\_CapData\_t[]}{(out)}{badges}{Badges for
59        endpoint capabilities received}
60        \ipcparam{seL4\_CPtr}{}{receiveCNode}{CPTR to a CNode from which to
61        find
62        the receive slot}
63        \ipcparam{seL4\_CPtr}{}{receiveIndex}{CPTR to the receive slot
64        relative to \texttt{receiveCNode}}
65        \ipcparam{seL4\_Word}{}{receiveDepth}{Number of bits of
66        \texttt{receiveIndex} to
67        use}
68        \bottomrule
69      \end{tabularx}
70    \caption{\label{tbl:ipcbuffer}Fields of the
71      \texttt{seL4\_IPCBuffer} structure.  Note that
72      \texttt{badges} and \texttt{caps} use the same area of memory in
73      the structure.}
74    \end{center}
75\end{table}
76
77The kernel assumes that the IPC buffer contains a structure of type
78\texttt{seL4\_IPCBuffer} as defined in \autoref{tbl:ipcbuffer}. The
79kernel uses as many physical registers as possible to transfer IPC
80messages. When more arguments are transferred than physical message
81registers are available, the kernel begins using the IPC buffer's
82\texttt{msg} field to transfer arguments. However, it leaves room in
83this array for the physical message registers. For example, if an IPC
84transfer or kernel object invocation required
854 message registers (and there are only 2 physical message registers
86available on this architecture) then arguments 1 and 2 would be
87transferred via message registers and arguments 3 and 4 would be in
88\texttt{msg[2]} and \texttt{msg[3]}.
89This allows the user-level object-invocation stubs to copy the arguments passed in physical registers to
90the space left in the \texttt{msg} array if desired.
91The situation is similar for the tag field.
92There is space for this field in the \texttt{seL4\_IPCBuffer} structure, which the kernel ignores.
93User level stubs
94may wish to copy the message tag from its CPU register to this field, although
95the user level stubs provided with the kernel do not do this.
96
97\section{Endpoints}
98
99\obj{Endpoint}s allow a small amount
100of data and capabilities (namely the IPC buffer) to be transferred between two
101threads. \obj{Endpoint} objects are invoked directly using the seL4 system calls
102described in \autoref{sec:syscalls}. 
103
104IPC \obj{Endpoints} uses a rendezvous model and as such is
105synchronous and blocking. An \obj{Endpoint} object  may queue
106threads either to send or to receive. If no receiver is ready, threads
107performing the \apifunc{seL4\_Send}{sel4_send} or \apifunc{seL4\_Call}{sel4_call}
108system calls will wait in a queue for the first available receiver. Likewise, if
109no sender is ready, threads performing the \apifunc{seL4\_Recv}{sel4_recv}
110system call or the second half of \apifunc{seL4\_ReplyRecv}{sel4_replyrecv}
111will wait for the first available sender.
112
113Trying to Send or Call without the Write right will fail and return an error. In
114the case of Send the error is ignored (The kernel isn't allowed to reply). Thus
115there is no way of knowing that a send has failed because of missing right.
116On the other hand calling \apifunc{seL4\_Recv}{sel4_recv} with a endpoint capability that
117does not have the Read right will raise a fault, see \autoref{sec:faults}. This
118because otherwise the error message would be indistinguishable from a normal
119message received from another thread via the endpoint.
120
121\subsection{Endpoint Badges\label{s:ep-badge}}
122\label{sec:ep-badges}
123
124Endpoint capabilities may be \emph{minted} to
125create a new endpoint capability with a \emph{badge} attached to it, a data
126word chosen by the invoker of the \emph{mint} operation. When a message is sent to an endpoint using a badged
127capability, the badge is transferred to the receiving thread's
128\texttt{badge} register.
129
130An endpoint capability with a zero badge is said to be \emph{unbadged}.
131Such a capability can be badged with the \apifunc{seL4\_CNode\_Mutate}{cnode_mutate} or \apifunc{seL4\_CNode\_Mint}{cnode_mint}
132invocations on the \obj{CNode} containing the capability. Endpoint
133capabilities with badges cannot be unbadged, rebadged or used to create
134child capabilities with different badges.
135
136On 32-bit platforms, only the low 28 bits of the badge are available for use.
137The kernel will silently ignore any usage of the high 4 bits.
138On 64-bit platforms, 64 bits are available for badges.
139
140\subsection{Capability Transfer}
141\label{sec:cap-transfer}
142
143Messages may contain capabilities, which will be copied to the
144receiver, provided that the endpoint capability
145invoked by the sending thread has Grant rights. An attempt to send
146capabilities using an endpoint capability without the Grant right will
147result in transfer of the raw message, without any capability transfer.
148
149Capabilities to be sent in a message are specified in the sending thread's
150IPC buffer in the \texttt{caps} field. Each entry in that array is interpreted
151as a CPTR in the sending thread's capability space. The number of capabilities
152to send is specified in the \texttt{extraCaps} field of the message tag.
153
154The receiver specifies the slot
155in which it is willing to receive a capability, with three fields within the IPC buffer: \texttt{receiveCNode}, \texttt{receiveIndex} and \texttt{receiveDepth}.
156These fields specify the root CNode, capability address and number of bits to resolve, respectively, to find
157the slot in which to put the capability. Capability
158addressing is described in \autoref{sec:cap_addressing}.
159
160Note that receiving threads may specify only one receive slot, whereas a
161sending thread may include multiple capabilities in the message. Messages
162containing more than one capability may be interpreted by kernel objects. They
163may also be sent to receiving threads in the case where some of the extra
164capabilities in the message can be \emph{unwrapped}.
165
166If the n-th capability in the message refers to the endpoint through
167which the message is sent, the capability is \emph{unwrapped}: its badge is placed into
168the n-th
169position of the receiver's badges array, and the kernel sets the n-th bit (counting from the
170least significant) in the \texttt{capsUnwrapped} field of the message
171tag. The capability itself is not transferred, so the receive slot may be used
172for another capability.
173
174A capability that is not unwrapped is transferred by copying it from the
175sender's CNode slot to the receiver's CNode slot. The sender retains access
176to the sent capability.
177
178If a receiver gets a message whose tag has an \texttt{extraCaps} of 2 and a
179\texttt{capsUnwrapped} of 2, then the first capability in the message was
180transferred to the specified receive slot and the second capability was
181unwrapped, placing its badge in \texttt{badges[1]}. There may have been a
182third capability in the sender's message which could not be unwrapped.
183
184\subsection{Errors}
185
186Errors in capability transfers can occur at two places: in the send
187phase or in the receive phase. In the send phase, all capabilities that
188the caller is attempting to send are looked up to ensure that they exist
189before the send is initiated in the kernel. If the lookup fails for any
190reason, \apifunc{seL4\_Send}{sel4_send} and \apifunc{seL4\_Call}{sel4_call} system calls immediately abort and
191no IPC or capability transfer takes place. The system call will return
192a lookup failure error as described in \autoref{sec:errors}.
193
194In the receive phase, seL4 transfers capabilities in the order that they
195are found in the sending thread's IPC buffer \texttt{caps} array
196and terminates as soon as an error is encountered. Possible error
197conditions are:
198
199\begin{itemize}
200    \item A source capability cannot be looked up. Although the presence
201    of the source capabilities is checked when the sending thread
202    performs the send system call, this error may still occur. The sending
203    thread may have been blocked on the endpoint for some time before it
204    was paired with a receiving thread. During this time, its
205    CSpace may have changed and the source capability pointers may
206    no longer be valid.
207
208    \item The destination slot cannot be looked up. Unlike the send
209    system call, the \apifunc{seL4\_Recv}{sel4_recv} system call does not check that the
210    destination slot exists and is empty before it initiates the receive.
211    Hence, the \apifunc{seL4\_Recv}{sel4_recv} system call will not fail with an error if the
212    destination slot is invalid and will instead transfer badged
213    capabilities until an attempt to save a capability to the
214    destination slot is made.
215
216    \item The capability being transferred cannot be derived. See
217    \autoref{sec:cap_derivation} for details.
218\end{itemize}
219
220An error will not void the entire transfer, it will just end it
221prematurely. The capabilities processed before the failure are still
222transferred and the \texttt{extraCaps} field in the receiver's IPC
223buffer is set to the number of capabilities transferred up to failure.
224No error message will be returned to the receiving thread in any of the
225above cases.
226
227\subsection{Calling and Replying}
228\label{sec:ep-cal}
229
230As explained in \autoref{sec:sys_call}, when the user calls
231\apifunc{seL4\_Call}{sel4_call} on an endpoint capability,
232some specific actions are taken. First a call will do exactly the same action as
233a normal \apifunc{seL4\_Send}{sel4_send}. Then after the rendezvous and all the
234normal IPC procedure happened, instead of returning directly to the caller,
235\apifunc{seL4\_Call}{sel4_call} will check if either Grant or GrantReply are
236present on the invoked endpoint capability:
237
238\begin{itemize}
239\item If this is not the case, the caller thread is suspended as if
240  \apifunc{seL4\_TCB\_Suspend}{tcb_suspend} was called on it. The send part of
241  the call would still have been performed as usual.
242\item If this is the case. A reply capability is set in a specific slot of the
243  receiver TCB. The Grant right of that reply capability is set by copying the Grant
244  right of the endpoint capability invoked by the receiver in
245  \apifunc{seL4\_Recv}{sel4_recv}.
246  Then, the caller thread is blocked waiting for the reply.
247  \end{itemize}
248
249A reply capability points directly to the caller thread and once the call has
250been performed is completely unrelated to the original \obj{Endpoint}. Even if
251the latter was destroyed, the reply capability would still exist and point to
252the caller who would still be waiting for a reply.
253
254The generated reply capability can then be either invoked in place (in the
255specific TCB slot) with the \apifunc{seL4\_Reply}{sel4_reply} or saved to an
256addressable slot using \apifunc{seL4\_CNode\_SaveCaller}{cnode_savecaller} to be
257invoked later with \apifunc{seL4\_Send}{sel4_send}. The specific slot cannot be
258directly addressed with any CPtr as it is not part of any CSpace.
259
260A reply capability is invoked in the same way as a normal send on a
261\obj{Endpoint}. A reply capability has implicitly the Write right, so the
262message will always go through. Transferring caps in the reply can only happen
263if the reply capability has the Grant right and is done in exactly the same way
264as in a normal IPC transfer as described in \autoref{sec:cap-transfer}.
265
266The main difference with a normal endpoint transfer is that the kernel guarantees
267that invoking a reply capability cannot block: If you own a reply capability,
268then the thread it points to is waiting for a reply. However a reply capability
269is a non-owning reference, contrary to all the other capabilities. That means that
270if the caller thread is destroyed or modified in any way that would render
271a reply impossible (for example being suspended with
272\apifunc{seL4\_TCB\_Suspend}{tcb_suspend}), the kernel would immediately destroy
273the reply capability.
274
275Once the reply capability has been invoked, the caller receives the message as if
276it has been performing a \apifunc{seL4\_Recv}{sel4_recv} and just received the
277message. In particular, it starts running again.
278
279The \apifunc{seL4\_Call}{sel4_call} operation exists not only for
280efficiency reasons (combining two operations into a single system
281call). It differs from
282\apifunc{seL4\_Send}{sel4_send} immediately followed by
283\apifunc{seL4\_Recv}{sel4_recv} in ways that allow certain system setup to work
284much more efficiently with much less setup that with a traditional setup.
285In particular, it is guaranteed that the reply received by the caller comes from
286the thread that received the call without having to check any kind of badge.
287