manual/parts/ipc.tex

%
% Copyright 2014, General Dynamics C4 Systems
%
% SPDX-License-Identifier: GPL-2.0-only
%

\chapter{\label{ch:ipc}Message Passing (IPC)}

The seL4 microkernel provides a message-passing IPC mechanism for communication
between threads. The same mechanism is also used for communication with
kernel-provided services. Messages are sent by invoking a capability to a
kernel object. Messages sent to \obj{Endpoint}s are destined for other
threads, while messages sent to other objects are processed by the kernel. This
chapter describes the common message format, endpoints,
and how they can be used for communication between applications.

\section{Message Registers}
\label{sec:messageinfo}

Each message contains a number of message words and optionally a number of
capabilities.
The message words are sent to or received from a thread by placing them in its \emph{message registers}.
The message registers are numbered and the first few message registers are implemented
using physical CPU registers, while the rest are backed by a fixed region of
memory called the \emph{IPC buffer}.
The reason for this design is efficiency:
very short messages need not use the memory.
The IPC buffer is assigned to the calling thread (see \autoref{sec:threads} and \autoref{api:tcb_setipcbuffer}).
%FIXME: seL4_TCB_SetIPCBuffer is only mentioned in the API reference!

Every IPC message also has a tag (structure \texttt{seL4\_MessageInfo\_t}).  The
tag consists of four fields: the label, message length, number of capabilities
(the \texttt{extraCaps} field) and the \texttt{capsUnwrapped} field.  The
message length and number of capabilities determine either the number of
message registers and capabilities that the sending thread wishes to transfer,
or the number of message registers and capabilities that were actually
transferred. The label is not interpreted by the
kernel and is passed unmodified as the first data payload of the message. The
label may, for example, be used to specify a requested operation. The
\texttt{capsUnwrapped} field is used only on the receive side, to indicate the
manner in which capabilities were received. It is described in
\autoref{sec:cap-transfer}.

% FIXME: a little too low-level, perhaps?

\newcommand{\ipcparam}[4]{\texttt{#1} \emph{#2}&\texttt{#3}&#4\\ }
\begin{table}[htb]
    \begin{center}
    \begin{tabularx}{\textwidth}{p{0.28\textwidth}p{0.18\textwidth}X}
      \toprule
      \textbf{Type} & \textbf{Name} & \textbf{Description} \\
      \midrule
        \ipcparam{seL4\_MessageInfo\_t}{}{tag}{Message tag}
        \ipcparam{seL4\_Word[]}{}{msg}{Message contents}
        \ipcparam{seL4\_Word}{}{userData}{Base address of the structure, used by
        supporting user libraries}
        \ipcparam{seL4\_CPtr[]}{(in)}{caps}{Capabilities to transfer}
        \ipcparam{seL4\_CapData\_t[]}{(out)}{badges}{Badges for
        endpoint capabilities received}
        \ipcparam{seL4\_CPtr}{}{receiveCNode}{CPTR to a CNode from which to
        find
        the receive slot}
        \ipcparam{seL4\_CPtr}{}{receiveIndex}{CPTR to the receive slot
        relative to \texttt{receiveCNode}}
        \ipcparam{seL4\_Word}{}{receiveDepth}{Number of bits of
        \texttt{receiveIndex} to
        use}
        \bottomrule
      \end{tabularx}
    \caption{\label{tbl:ipcbuffer}Fields of the
      \texttt{seL4\_IPCBuffer} structure.  Note that
      \texttt{badges} and \texttt{caps} use the same area of memory in
      the structure.}
    \end{center}
\end{table}

The kernel assumes that the IPC buffer contains a structure of type
\texttt{seL4\_IPCBuffer} as defined in \autoref{tbl:ipcbuffer}. The
kernel uses as many physical registers as possible to transfer IPC
messages. When more arguments are transferred than physical message
registers are available, the kernel begins using the IPC buffer's
\texttt{msg} field to transfer arguments. However, it leaves room in
this array for the physical message registers. For example, if an IPC
transfer or kernel object invocation required
4 message registers (and there are only 2 physical message registers
available on this architecture) then arguments 1 and 2 would be
transferred via message registers and arguments 3 and 4 would be in
\texttt{msg[2]} and \texttt{msg[3]}.
This allows the user-level object-invocation stubs to copy the arguments passed in physical registers to
the space left in the \texttt{msg} array if desired.
The situation is similar for the tag field.
There is space for this field in the \texttt{seL4\_IPCBuffer} structure, which the kernel ignores.
User level stubs
may wish to copy the message tag from its CPU register to this field, although
the user level stubs provided with the kernel do not do this.

\section{Endpoints}

\obj{Endpoint}s allow a small amount
of data and capabilities (namely the IPC buffer) to be transferred between two
threads. \obj{Endpoint} objects are invoked directly using the seL4 system calls
described in \autoref{sec:syscalls}.

IPC \obj{Endpoints} uses a rendezvous model and as such is
synchronous and blocking. An \obj{Endpoint} object  may queue
threads either to send or to receive. If no receiver is ready, threads
performing the \apifunc{seL4\_Send}{sel4_send} or \apifunc{seL4\_Call}{sel4_call}
system calls will wait in a queue for the first available receiver. Likewise, if
no sender is ready, threads performing the \apifunc{seL4\_Recv}{sel4_recv}
system call or the second half of \apifunc{seL4\_ReplyRecv}{sel4_replyrecv}
will wait for the first available sender.

Trying to Send or Call without the Write right will fail and return an error. In
the case of Send the error is ignored (The kernel isn't allowed to reply). Thus
there is no way of knowing that a send has failed because of missing right.
On the other hand calling \apifunc{seL4\_Recv}{sel4_recv} with a endpoint capability that
does not have the Read right will raise a fault, see \autoref{sec:faults}. This
because otherwise the error message would be indistinguishable from a normal
message received from another thread via the endpoint.

\subsection{Endpoint Badges\label{s:ep-badge}}
\label{sec:ep-badges}

Endpoint capabilities may be \emph{minted} to
create a new endpoint capability with a \emph{badge} attached to it, a data
word chosen by the invoker of the \emph{mint} operation. When a message is sent to an endpoint using a badged
capability, the badge is transferred to the receiving thread's
\texttt{badge} register.

An endpoint capability with a zero badge is said to be \emph{unbadged}.
Such a capability can be badged with the \apifunc{seL4\_CNode\_Mutate}{cnode_mutate} or \apifunc{seL4\_CNode\_Mint}{cnode_mint}
invocations on the \obj{CNode} containing the capability. Endpoint
capabilities with badges cannot be unbadged, rebadged or used to create
child capabilities with different badges.

On 32-bit platforms, only the low 28 bits of the badge are available for use.
The kernel will silently ignore any usage of the high 4 bits.
On 64-bit platforms, 64 bits are available for badges.

\subsection{Capability Transfer}
\label{sec:cap-transfer}

Messages may contain capabilities, which will be copied to the
receiver, provided that the endpoint capability
invoked by the sending thread has Grant rights. An attempt to send
capabilities using an endpoint capability without the Grant right will
result in transfer of the raw message, without any capability transfer.

Capabilities to be sent in a message are specified in the sending thread's
IPC buffer in the \texttt{caps} field. Each entry in that array is interpreted
as a CPTR in the sending thread's capability space. The number of capabilities
to send is specified in the \texttt{extraCaps} field of the message tag.

The receiver specifies the slot
in which it is willing to receive a capability, with three fields within the IPC buffer: \texttt{receiveCNode}, \texttt{receiveIndex} and \texttt{receiveDepth}.
These fields specify the root CNode, capability address and number of bits to resolve, respectively, to find
the slot in which to put the capability. Capability
addressing is described in \autoref{sec:cap_addressing}.

Note that receiving threads may specify only one receive slot, whereas a
sending thread may include multiple capabilities in the message. Messages
containing more than one capability may be interpreted by kernel objects. They
may also be sent to receiving threads in the case where some of the extra
capabilities in the message can be \emph{unwrapped}.

If the n-th capability in the message refers to the endpoint through
which the message is sent, the capability is \emph{unwrapped}: its badge is placed into
the n-th
position of the receiver's badges array, and the kernel sets the n-th bit (counting from the
least significant) in the \texttt{capsUnwrapped} field of the message
tag. The capability itself is not transferred, so the receive slot may be used
for another capability.

A capability that is not unwrapped is transferred by copying it from the
sender's CNode slot to the receiver's CNode slot. The sender retains access
to the sent capability.

If a receiver gets a message whose tag has an \texttt{extraCaps} of 2 and a
\texttt{capsUnwrapped} of 2, then the first capability in the message was
transferred to the specified receive slot and the second capability was
unwrapped, placing its badge in \texttt{badges[1]}. There may have been a
third capability in the sender's message which could not be unwrapped.

\subsection{Errors}

Errors in capability transfers can occur at two places: in the send
phase or in the receive phase. In the send phase, all capabilities that
the caller is attempting to send are looked up to ensure that they exist
before the send is initiated in the kernel. If the lookup fails for any
reason, \apifunc{seL4\_Send}{sel4_send} and \apifunc{seL4\_Call}{sel4_call} system calls immediately abort and
no IPC or capability transfer takes place. The system call will return
a lookup failure error as described in \autoref{sec:errors}.

In the receive phase, seL4 transfers capabilities in the order that they
are found in the sending thread's IPC buffer \texttt{caps} array
and terminates as soon as an error is encountered. Possible error
conditions are:

\begin{itemize}
    \item A source capability cannot be looked up. Although the presence
    of the source capabilities is checked when the sending thread
    performs the send system call, this error may still occur. The sending
    thread may have been blocked on the endpoint for some time before it
    was paired with a receiving thread. During this time, its
    CSpace may have changed and the source capability pointers may
    no longer be valid.

    \item The destination slot cannot be looked up. Unlike the send
    system call, the \apifunc{seL4\_Recv}{sel4_recv} system call does not check that the
    destination slot exists and is empty before it initiates the receive.
    Hence, the \apifunc{seL4\_Recv}{sel4_recv} system call will not fail with an error if the
    destination slot is invalid and will instead transfer badged
    capabilities until an attempt to save a capability to the
    destination slot is made.

    \item The capability being transferred cannot be derived. See
    \autoref{sec:cap_derivation} for details.
\end{itemize}

An error will not void the entire transfer, it will just end it
prematurely. The capabilities processed before the failure are still
transferred and the \texttt{extraCaps} field in the receiver's IPC
buffer is set to the number of capabilities transferred up to failure.
No error message will be returned to the receiving thread in any of the
above cases.

\subsection{Calling and Replying}
\label{sec:ep-cal}

As explained in \autoref{sec:sys_call}, when the user calls
\apifunc{seL4\_Call}{sel4_call} on an endpoint capability,
some specific actions are taken. First a call will do exactly the same action as
a normal \apifunc{seL4\_Send}{sel4_send}. Then after the rendezvous and all the
normal IPC procedure happened, instead of returning directly to the caller,
\apifunc{seL4\_Call}{sel4_call} will check if either Grant or GrantReply are
present on the invoked endpoint capability:

\begin{itemize}
\item If this is not the case, the caller thread is suspended as if
  \apifunc{seL4\_TCB\_Suspend}{tcb_suspend} was called on it. The send part of
  the call would still have been performed as usual.
\item If this is the case. A reply capability is set in a specific slot of the
  receiver TCB. The Grant right of that reply capability is set by copying the Grant
  right of the endpoint capability invoked by the receiver in
  \apifunc{seL4\_Recv}{sel4_recv}.
  Then, the caller thread is blocked waiting for the reply.
  \end{itemize}

A reply capability points directly to the caller thread and once the call has
been performed is completely unrelated to the original \obj{Endpoint}. Even if
the latter was destroyed, the reply capability would still exist and point to
the caller who would still be waiting for a reply.

The generated reply capability can then be either invoked in place (in the
specific TCB slot) with the \apifunc{seL4\_Reply}{sel4_reply} or saved to an
addressable slot using \apifunc{seL4\_CNode\_SaveCaller}{cnode_savecaller} to be
invoked later with \apifunc{seL4\_Send}{sel4_send}. The specific slot cannot be
directly addressed with any CPtr as it is not part of any CSpace.

A reply capability is invoked in the same way as a normal send on a
\obj{Endpoint}. A reply capability has implicitly the Write right, so the
message will always go through. Transferring caps in the reply can only happen
if the reply capability has the Grant right and is done in exactly the same way
as in a normal IPC transfer as described in \autoref{sec:cap-transfer}.

The main difference with a normal endpoint transfer is that the kernel guarantees
that invoking a reply capability cannot block: If you own a reply capability,
then the thread it points to is waiting for a reply. However a reply capability
is a non-owning reference, contrary to all the other capabilities. That means that
if the caller thread is destroyed or modified in any way that would render
a reply impossible (for example being suspended with
\apifunc{seL4\_TCB\_Suspend}{tcb_suspend}), the kernel would immediately destroy
the reply capability.

Once the reply capability has been invoked, the caller receives the message as if
it has been performing a \apifunc{seL4\_Recv}{sel4_recv} and just received the
message. In particular, it starts running again.

The \apifunc{seL4\_Call}{sel4_call} operation exists not only for
efficiency reasons (combining two operations into a single system
call). It differs from
\apifunc{seL4\_Send}{sel4_send} immediately followed by
\apifunc{seL4\_Recv}{sel4_recv} in ways that allow certain system setup to work
much more efficiently with much less setup that with a traditional setup.
In particular, it is guaranteed that the reply received by the caller comes from
the thread that received the call without having to check any kind of badge.