%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Copyright (c) 2011, ETH Zurich.
% All rights reserved.
%
% This file is distributed under the terms in the attached LICENSE file.
% If you do not find this file, copies can be found by writing to:
% ETH Zurich D-INFK, Universitaetstrasse 6, CH-8092 Zurich. Attn: Systems Group.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\documentclass[a4paper,twoside]{report} % for a report (default)

\usepackage{bftn,color} % You need this
\usepackage{subfig}
\usepackage{listings}
\usepackage{verbatim}

\title{Barrelfish Architecture Overview}   % title of report
\author{Team Barrelfish}	% author
\tnnumber{000}  % give the number of the tech report
\tnkey{Barrelfish Overview} % Short title, will appear in footer

% \date{Month Year} % Not needed - will be taken from version history

%% \newcommand{\note}[1]{}
\newcommand{\note}[1]{[\textcolor{red}{\textit{#1}}]}

\begin{document}
\maketitle

%
% Include version history first
%
\begin{versionhistory}
\vhEntry{1.0}{24.06.2010}{ikuz, amarp}{Initial version}
\vhEntry{2.0}{04.12.2013}{troscoe,shindep}{Extensively updated}
\end{versionhistory}

% \intro{Abstract}		% Insert abstract here
% \intro{Acknowledgements}	% Uncomment (if needed) for acknowledgements
\tableofcontents		% Uncomment (if needed) for final draft
% \listoffigures		% Uncomment (if needed) for final draft
% \listoftables			% Uncomment (if needed) for final draft


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Barrelfish Architecture Overview}

This document is intended as an introduction to Barrelfish. 
It presents a high level overview of the architecture of the Barrelfish
Operating System, together with the most important concepts used
inside the OS at various levels.  

It does not provide a ``getting started'' guide to running Barrelfish
on a PC, or in an emulation enviroment like Qemu or GEM5 - this is
provided in other documentation.  

\section{High level overview}\label{sec:overview}

Barrelfish is ``multikernel'' operating
system~\cite{barrelfish:sosp09}: it consists of a small kernel running on each core (one kernel per core), and while rest of the
OS is structured as a distributed system of single-core processes atop
these kernels.  Kernels share no memory, even on a machine with
cache-coherent shared RAM, and the rest of the OS does not use shared
memory except for transferring messages and data between cores, and
booting other cores.  Applications can use multiple cores and share
address spaces (and therefore cache-coherent shared memory) between
cores, but this facility is provided by user-space runtime libraries. 

Figure~\ref{fig:os-arch} shows an overview of the components of the OS.  Each
core runs a kernel which is called a "CPU driver". The CPU driver maintains
capabilities, executes syscalls on capabilities, schedules dispatchers, and
processes interrupts, pagefaults, traps, and exceptions.

The kernel schedules and runs "Dispatchers". Dispatchers are an
implementation of a form of scheduler
activations~\cite{Anderson:1991:SAE:121132.121151} (the term is
borrowed from K42~\cite{k42:scheduling}), thus each dispatcher can run
and schedule its own threads.  Multiple dispatchers can be combined
into a domain. Typically this is used to combine related dispatchers
running on different cores. Often dispatchers in a domain share (all
or part of) their vspace. Unless dispatchers in a domain run on the
same core (which is possible, but rare) they cannot share a cspace
(since cspaces are core specific).

Typically we refer to user level processes (or services or servers) as "user
domains". Often these consist of a single dispatcher running on a single core.

\begin{figure}[hbt]
 \begin{center}
 \includegraphics[width=0.5\columnwidth]{os-arch.pdf}
 \end{center}
 \caption{High level overview of the Barrelfish OS architecture}\label{fig:os-arch}
\end{figure}

\section{CPU drivers}

The kernel which runs on a given core in a Barrelfish machine is
called a \emph{CPU driver}.  Each core runs a separate instance of the
CPU driver, and there is no reason why these drivers have to be the
same code.  

In a heterogeneous Barrelfish system, CPU drivers will be different
for each architecture.  However, even on a homogeneous machine, CPU
drivers for different cores can be specialized for some purposes (for
example, some might suppose virtualization extensions, while others
might be optimized for running a single application).  

CPU drivers are single-threaded and non-preemptible.
They run with interrupts disabled on their core, and share no state
with other cores.  CPU drivers can be thought of as serially executing
exception handlers for events such as interrupts, faults, and system
calls from user-space tasks.  Each such handler executes in bounded time
and runs to completion.  If there are no such handlers to execute, the
CPU driver executes a user-space task. 

The functions of a CPU driver are to provide:
\begin{itemize}
\item Scheduling of different user-space \textit{dispatchers} on the
  local core.
\item Core-local communication of short messages between dispatchers
  using a variant of Lightweight RPC~\cite{lrpc:tocs90} or L4
  RPC~\cite{Liedtke:1993:IIK:168619.168633}. 
\item Secure access to the core hardware, MMU, APIC, etc.
\item Local access control to kernel objects and physical memory by
  means of capabilities
\end{itemize}

CPU drivers do not provide kernel threads, for two reasons.  Firstly,
the abstraction of the processor provided to user space programs is a
dispatcher, rather than a thread.  Secondly, since the kernel is
single-threaded and non-preemptible, it uses only a single, statically
allocated stack for all operations. After executing a single exception
handler, the contents of this stack are discarded and reset. 

CPU drivers schedule dispatchers.  The scheduling algorithm employed
by a given CPU driver binary is configurable at compile time, and two
are currently implemented:
\begin{itemize}
\item A simple round-robin scheduler.  This is primarily used for
  debugging purposes, since its behavior is easy to understand.
\item A rate-based scheduler implementing a version of the RBED
  algorithm~\cite{Brandt:2003:DIS:956418.956606}. This is the
  preferred per-core scheduler for Barrelfish, and provides efficient
  scheduling of tasks with a variety of hard and soft realtime jobs
  with good support for best-effort processes as well. 
\end{itemize}

\section{Dispatchers}

The dispatcher is the unit of kernel scheduling, and on a single core
roughly corresponds to the concept of process in Unix.  

A dispatcher can be thought of as the local component of an
application (often called a \textit{domain} in Barrelfish) on a
particular core.  An application which spans multiple cores (for
example, an OpenMP program) has a dispatcher on each core that it
might potentially execute on.  

Operating system tasks in Barrelfish are always single-core by design,
and therefore have only one dispatcher.  Applications, however,
frequently have more, one on each core. 

Dispatchers do not migrate between cores. 

When a CPU driver decides to run a dispatcher as a result of a
scheduling decision, it
\emph{upcalls}~\cite{Clark:1985:SSU:323647.323645} into the dispatcher
as an exit from kernel mode in the manner described below.  The
user-level code associated with the dispatcher then executes
user-space thread scheduling, as well as handling other events (such
as a page fault signal from the kernel or the arrival of an
inter-domain message). 

\begin{figure}[hbt]
 \begin{center}
 \includegraphics[width=0.5\columnwidth]{dcb.pdf}
 \end{center}
 \caption{Dispatcher Control Block}\label{fig:dcb}
\end{figure}

The kernel maintains a DCB (dispatcher control block) for each dispatcher
(Figure~\ref{fig:dcb}). The DCB contains entries that define the dispatcher's
cspace (capability tables), vspace (page tables), some scheduling parameters,
and a pointer to a user space dispatcher structure (actually it consists of
several structs). This struct manages the scheduling of the dispatcher's
threads.

The dispatcher can be in one of two modes: enabled and disabled. It is in
enabled mode when running user thread code. It is in disabled mode when running
dispatcher code (e.g., when managing TCBs, run queues, etc.). The dispatcher
defines a number of upcall entry points that are invoked by the kernel when it
schedules the dispatcher.

The main upcall is \texttt{run()}.  When the kernel decides to schedule a
dispatcher, it brings in the base page table pointed to by the vspace
capability, and calls the dispatcher's \texttt{run()} upcall.  The dispatcher
must decide which thread it wants to run, restore the thread's state, and then
run it. Note that when \texttt{run()} is invoked the dispatcher first runs in
disabled mode until the thread actually starts executing at which point the
dispatcher switches to enabled mode.

When a dispatcher running in enabled mode is preempted, the kernel saves all
register state to a save area (labelled "enabled"). When the dispatcher is next
restarted it can restore this register state and run the preempted thread, or it
can decide to schedule another thread, in which case it must first save the
saved registers to a TCB.  

When the kernel preempts a dispatcher running in disabled state, it stores the
register state in a save area labelled "disabled". When the kernel schedules a
dispatcher that is in disabled state, it does not invoke \texttt{run()}. Instead
it restores the registers stored in the disabled save area. This causes the
dispatcher to resume execution from where it was preempted.

\section{Runtime libraries}

The basic runtime of any Barrelfish program is two libraries: a
standard C library (currently a version of
\texttt{newlib}~\cite{newlib}), and \texttt{libbarrelfish}.  In
practice the two libraries are inter-dependent, though there is an
ongoing effort to minimize truly cyclic dependencies.  The principal
cycles involve \texttt{malloc()} and communication with the memory
server. 

The barrelfish library implements the following functionality:
\begin{itemize}
\item User-level thread scheduling, based on the upcall model of the
  dispatcher.
\item Physical memory management by communication with the memory
  server. 
\item Capability management (maintaining the domain's cspace),
  including allocating new CNodes where necessary. 
\item Construction of virtual address spaces using capability
  invocations.
\item User-level page fault handling.
\item Communications using message channels, including blocking and
  nonblocking calls on channels, and the implementation of
  \texttt{waitset}s (analogous to Unix \texttt{select()} and
  \texttt{epoll()}). 
\end{itemize}

\section{Capabilities}

Barrelfish uses a single model of
\emph{capabilities}~\cite{hank:capabilities} to control access to all
physical memory, kernel objects, communication end-points, and other
miscellaneous access rights. 

The Barrelfish capability model is similar to the seL4
model~\cite{sel4:iies08}, with a considerably larger type system and
extensions for distributed capability management between cores. 

Kernel objects are referenced by partitioned capabilities. The actual
capability can only be directly accessed and manipulated by the
kernel, while user level only has access to capability references
(\texttt{struct capref}), which are addresses in a cspace. User level
can only manipulate capabilities using kernel system calls (to which
it passes capability references). In processing a system call the 
kernel looks up the capability reference in the appropriate cspace to find the
actual capability; this process is similar to the translation of a virtual
address to a physical address (Figure~\ref{fig:cap_translation}). A capability
reference can be used to get \texttt{caddr\_t} and \texttt{valid bits} which are
used in system calls (see \texttt{include/barrelfish/caddr.h}).  Valid bits are
used when the user wants to access cnode type capabilities (else, the
translation process would continue by looking at entries in the cnode).
 
\begin{figure}
  \centering
  \subfloat[capref to capability translation.]{\label{fig:cap_translation}\includegraphics[width=0.4\textwidth]{cap_translation.pdf}}
  \quad
  \subfloat[Capability Hierarchy]{\label{fig:cap_hierarchy}\includegraphics[width=0.5\textwidth]{cap_heirarchy.pdf}}
  \caption{Capabilities}
  \label{fig:caps}
\end{figure}

A Dispatcher only has access to the capability references in its cspace. As in
seL4, capabilities are typed, and there is a strictly defined derivation
hierarchy (e.g. Figure~\ref{fig:cap_hierarchy}).  The type system for
capabilities is defined using a domain-specific language called
Hamlet. 

\section{Physical Memory}

All regions of physical address space are referred to by means of
capabilities.  At present, a memory capability refers to a
naturally-aligned, power-of-two-sized area of at least a physical page
in size, though this restriction is likely to go away in the future. 

A memory capability can be split into smaller capabilities, and this
basic mechanism is used for low-level physical memory allocation. 

Memory capabilities are \emph{typed}, and each region type supports a
limited set of operations.  Initially, memory consists of untyped RAM
and \emph{device frames}, which are used to refer to memory-mapped I/O
regions. 

RAM capabilities can be retyped into other types; the precise number
is defined in the Hamlet language in a file currently located in
\texttt{/capabilities/caps.hl}.  Some of the more common types of
memory capability are:
\begin{itemize}
\item \emph{Frame} capabilities are RAM capabilities which can be
  mapped into a user's virtual address space.
\item \emph{CNode} capabilities are used to hold the bit
  representation of other capabilities, and construct a domain's
  cspace.  CNodes can never be mapped as writeable virtual memory,
  since this would remove the security of the capability system by
  allowing an application to forge a capability.
\item \emph{Dispatcher} capabilities hold dispatcher control blocks.
\item \emph{Page table} capabilities refer to memory which holds page
  table pages.  There is a different capability type for each level of
  each type of MMU architecture.
\end{itemize}

\section{Virtual Memory}

Barrelfish applications are
\emph{self-paging}~\cite{Hand:1999:SNO:296806.296812}: they securely
construct and maintain their own page tables, and page faults are
reflected back to the application by means of an upcall. 

An application constructs its own virtual address space (generally
abbreviated to \emph{vspace}) using the capability system.  It
acquires RAM from a memory allocator in the form of RAM capabilities,
and then retypes these to frames and page tables.  

To take a 32-bit x86 machine without Physical Address Extensions (PAE)
as a simple example, to map a frame into its vspace an application
invokes the capability refering to a Page Table page using a system
call.  The arguments to this capability invocation are (1) a
capability for a 4kB frame, (2) a slot number (from 0 to 1023), and
(3) a set of mapping flags.  This will cause the kernel to insert a
Page Table Entry (PTE) with the appropriate flags into the
corresponding slot in the Page Table.

Note that, while the user determines exactly what mapping is entered
in the page table, the process is secure: the user cannot map a frame
that they do not already hold a capability for, that capability must
refer to a frame (and not some other type of memory, such as a
dispatcher control block or CNode), and they must also hold a
capability for the page table itself. 

Similarly, in this example, for the Page Table page to be useful it
must itself be referenced from a Page Directory.  To ensure this, the
user program must perform a similar invocation on the Page Directory
capability, passing the slot number, flags, and the capability to the
Page Table page.  As before, the capability type system allows neither
the construction of an invalid page table for any architecture, nor
the mapping of any page that the user has no authorization for. 

This model also has the additional feature that any core can construct
a valid page table for any other core, even if the cores have
different MMU architectures.  An x86\_64 core can construct a valid
and secure ARMv7 virtual address space, and pass a capability for the
L1 Page Table of this vspace to an ARMv7 core for installation. 

Furthermore, since the application is responsible for constructing
page tables and allocating physical memory itself, it can handle page
faults by changing its own mappings, potentially paging data to and
from stable storage in the process. 

Naturally, this pushes significant complexity into the application.
Paging functionaly is generally hidden in the runtime
\texttt{libbarrelfish} library, though is available for direct
manipulation for non-common use cases. 

\section{Inter-domain communication}

Barrelfish provides a uniform interface for passing messages between
domains, which handles message formating and marshalling, name
lookup, and end-point binding.  

This interface is \emph{channel-based}: for two domains to
communicate, they must first establish a channel (which may involve
agreeing on some shared state).  Messages are thereafter sent on this
channel.  The process of establishing channel state is known as
\emph{binding}.

A channel is typed, and the type determines the kinds of messages that
can be sent on it.  As with many RPC systems, channel types are
defined using an \emph{interface definition language}, which in
Barrelfish is known as Flounder.    Flounder interface types are
generally found in the \texttt{/if} directory in the Barrelfish tree.
Messages can pass both simple data types and, optionally,
capabilities. 

To establish a communications channel, a domain typically has to
acquire an \emph{interface reference} from somewhere, which identifies
the remote endpoint it will try to connect to.  In the common case,
this is obtained from the System Knowledge Base, acting as a name
server. After having bound an interface reference, the result is a
local stub object at either end of the channel which can be called to
send a message, and also implements dispatch of received messages. 

In turn, an interface reference is created by a server creating a
service (analogous to a listening socket) and registering this with a
name service. 

There are a variety of underlying implementations of message passing
in Barrelfish, known as \emph{Message Transports}.  Each one is highly
optimized for a particular hardware scenario.  In general, the binding
process automatically selects the appropriate transport for a channel,
though this process can also be overwritten. 

A message transport itself consists of several components:
\begin{enumerate}
\item An \emph{interconnect driver} or ICD sends small, fixed-size
  units of data between domains.  This is highly optimized: for
  same-core message passing, the LMP (``local message passing'') ICD
  is based on L4's RPC path, while between cache-coherent cores the
  UMP (``user-level message passing'') ICD is similar to
  URPC~\cite{urpc:tocs91} or
  FastForward~\cite{Giacomoni:2008:FEP:1345206.1345215} and transfers
  single cache lines without involving the kernel.  Other ICDs exist
  for specialized messaging hardware, and/or network communication.

  ICDs do \emph{not} export a standard interface; each one is
  different. 
\item \emph{Stubs} generated by Flounder from an interface
  specification provide the uniform interface to clients and servers,
  and perform the abstraction of ICDs, as well as handling message
  fragmentation and reassembly in the cases where an application
  message is larger than the unit transferred by the ICD.  Since
  different ICDs do not export the same interface, there is a
  different Flounder code generator for each ICD type, enabling
  cross-layer optimizations not possible otherwise. 
\item Finally, some stubs can also invoke separate \emph{Notification
  Drivers}, which provide a synchronous signal to the receiving domain
  in cases where the transfer of the message payload itself does not.
  For example, UMP messages have to be polled by the reciever by
  default, but on some architectures an additional notification driver
  can be invoked by the stub to cause an inter-processor interrupt
  (IPI) after some time if the other end of the channel appears to be
  asleep. 
\end{enumerate}

In addition, stubs can also handle other areas of complexity resulting
from the optimized nature of interconnect drivers.  For example,
capabilities cannot be sent directly over a UMP channel, since their
bit representations cannot be safely made available to user-space
applications.  Instead, the stubs for UMP transport send capabilities
over a separate, secure channel via the Monitors on the sending and
receiving cores, which are themselves contacted using LMP (which can
transfer capabilities) by the sending and receiving domains.  

In this way, transfer of pure data over message transports can be
extremely fast, at the cost of extra delay when transferring
capabilities. 

\section{System Knowledge Base}

Barrelfish includes a system service called the System Knowledge Base,
which is used to store, query, unify, and compute on a variety of data
about the current running state of the system.  The SKB is widely used
in Barrelfish; examples are:

\begin{itemize}
\item The Octopus lock manager and pub/sub system is built as an
  extension to the SKB (with a somewhat different interface
  language). 
\item Devices are discovered and entered into the SKB, and the SKB
  contains rules to determine the appropriate driver for each device.
  This forms device management (see below).
\item The PCI driver uses constraint solving in the SKB to correctly
  configure PCI devices and bridges.  The SKB contains a list of known
  PCI quirks, bugs, and special cases that have to taken in to account
  when allocating BAR values for address ranges.  Interrupt routing is
  also performed in the SKB.
\item The SKB functions as a general-purpose name server / interface
  trader for the rest of the OS.
\item The results of online hardware profiling are entered into the
  SKB at boottime, and can be used to construct optimal communication
  patterns in the system.
\end{itemize}

The SKB implemention for Intel architectures is currently based on a
port of the eCLiPse constraint logic programming
system~\cite{eclipse}.  This consists of a Prolog interpreter with
constraint solving extensions, and has been extended with a fast
key-value store which can also retain capabilities.  Aside from the
Octopus interface, it is possible to communicate with the SKB by
sending Prolog expressions as messages.

\section{Device drivers}

Device drivers in Barrelfish are implemented as individual dispatchers
or domains.  When they start up, they are supplied with arguments
which include various option variables, together with a set of
capabilities which authorize the driver to access the hardware device.

The capabilities a driver needs are generally:
\begin{enumerate}
\item \emph{Device frame} capabilities for regions of the physical
  address space containing hardware registers; these regions are then
  memory-mapped by the driver domain as part of its initialization
  sequence. 
\item \emph{Interrupt} capabilities, which can be used to direct the
  local CPU driver to deliver particular interrupt vectors to the
  driver domain. 
\item On x86 machines, \emph{I/O capabilities} allow access to regions
  of the I/O port space for the driver. 
\item For message-based device interfaces, such as USB, there may be
  \emph{communication end-point capabilities} which allow messages to
  be sent between the driver domain and the driver for the host
  interface adaptor itself. 
\end{enumerate}

Hardware interrupts are received by a core, demultiplexed if
necessary, and then turned into local inter-domain messages which are
dispatched to the appropriate driver domain.  Each first-level
interrupt handler disables the interrupt as it is raised.  As in L4,
interrupts are renabled by the driver domain sending a reply message
back to the CPU driver. 

Driver domains themselves then export their functionality to clients
(such as file systems, or network stacks) via further message-passing
interfaces. 

\section{Device management}

Device management in Barrelfish is performed by a combination of
components:
\begin{itemize}
\item The System Knowledge Base is used to store information about all
  discovered hardware in the system.  It also holds the information
  about which driver binaries should be used with which devices, and
  contains rules to determine on which core each device's driver domain
  should run. 
\item Octopus, the Barrelfish locking and pub/sub system built on top
  of the SKB, is used to propagate events corresponding to device
  discovery, hotplug, and driver startup and shutdown. 
\item Kaluga is the Barrelfish device manager, and is responsible to
  starting up driver domains based on information in the SKB and in
  response to events disseminated by Octopus.  It handles passing the
  appropriate authorizations (in the form of device, I/O, and
  interrupt capabilities) to new driver domains. 
\item The driver domains themselves populate the SKB with further
  information about devices. For example, the PCI driver stores
  information about enumerated buses and functions in the SKB, and USB
  Host Adaptor drivers so something similar for enumerated devices.
\end{itemize}

In newer versions of Barrelfish, the same framework is also used for
booting (and suspending or shutting down) cores.  In addition to an
appropriate CPU driver, each core other than the bootstrap core has a
\emph{Boot Driver}, which is responsible for booting the new core and
encapsules the protocol (usually platform-specific) for core startup. 

\section{Monitor}

Each Barrelfish core runs a special process called the \emph{Monitor},
which is responsible for distributed coordination between cores.  All
monitors maintain a network of communication channels among
themselves; any monitor can talk to (and identify) any other monitor.
All dispatchers on a core have a local message-passing channel to
their monitor. 

Monitors are trusted: they can fabricate capabilities. This is because
they are responsible for transferring capabilities between cores, and
so must be able to serialize a capability into bits and reconstruct it
at the other end.  Monitors therefore posess a special capability (the
\emph{Kernel capability}) which allows them to manipulate their local
core's capability database. 

The monitor performs many low-level OS functions which require
inter-core communication (since CPU drivers by design do not
communicate with each other, nor share any memory).  For example: 

\begin{itemize}
\item Monitors route inter-core bind requests for communication
  channels between domains which have not previously communicated
  directly. 
\item They send capabilities along side channels, since regular
  domains on different cores cannot send capabilities directly
  themselves without involving the kernel. 
\item Monitors help with domain startup by supplying dispatchers with
  useful initial capabilities (such as a communication channel back
  to the monitor, and ones to the memory allocator and SKB). 
\item They perform distributed capability revocation, using a form of
  two-phase commit protocol among themselves over the capability
  database. 
\end{itemize}

The monitor contains a distributed implementation of the functionality
found in the lower-levels of a monolithic kernel.  The choice to make
it into a user-space process rather than amalgamating it with the CPU
driver was an engineering decision: it results in lower performance
(many operations which would be a single syscall on Unix require two
full context switches to and from the monitor on Barrelfish).
However, running the monitor as a user-space process means it can be
time-sliced along with other processes, can block when waiting for I/O
(the CPU driver has no blocking operations), can be implemented using
threads, and provides a useful degree of fault isolation (it is quite
hard to crash the CPU driver in Barrelfish, since it performs no
blocking operations and requires no dynamic memory allocation).

\section{Memory Servers}

Memory servers are responsible for serving RAM capabilities to
domains.  A memory server is started with an initial (small) set of
capabilities which list the regions of memory the server is
responsible, and they respond to requests from other domains for RAM
capabilities of different sizes. 

At system startup, there is an initial Memory Server running on the
bootstrap core which is created with a capability to the entire range
of RAM addressable from that core.  However, the use of capabilities
allows this to delegate management of subregions of this space to
other servers.  This is desirable for two reasons:
\begin{itemize}
\item It allows core to have their own memory allocators, greatly
  improving parallelism and scalability of the system. As in other
  systems, Barrelfish's per-core allocators can also steal memory from
  other cores if they become short. 
\item It permits different allocators for different types of memory,
  such as different NUMA nodes, or low-memory accessible to legacy DMA
  devices.
\end{itemize}

Each new dispatcher in Barrelfish is started with a bootstrapped
message channel to its own local memory server. 

\section{Filing system}

Barrelfish at present has no native file system.  However, runtime
libraries do provide a Virtual File System (VFS) interface, and a
number of backends to this exist (and are used in the system),
including:
\begin{itemize}
\item An NFSv3 client.
\item A simple RAM-based file system.
\item Access to the OS multiboot image via the VFS interface.
\item A FAT file system which can be used with the AHCI driver and ATA
  library. 
\end{itemize}

\section{Network stack}

\begin{figure}[hbt]
 \begin{center}
 \includegraphics[width=0.7\columnwidth]{net-arch.pdf}
 \end{center}
 \caption{High level overview of the Barrelfish network stack}\label{fig:net-arch}
\end{figure}

At time of writing, the Barrelfish network stack is evolving, but
Figure~\ref{fig:net-arch} shows the high-level structure.  Barrelfish
aims at removing as much OS code from the datapath as possible, and
uses a design inspired by Nemesis~\cite{Black:1997:PIV:648046.745222}
and Exokernel~\cite{Ganger:2002:FFA:505452.505455}. 

Each physical network interface has a driver which is started by the
Kaluga device manager service.  This driver is in turn capable of
initiating a separate \emph{queue manager} for each hardware queue
supported by the network hardware, and configuring the NIC hardware to
demultiplex incoming packets to the appropriate queue.  

Applications run their own network stack on a per-flow basis, either
reading and writing packets directly to and from hardware queues (if
possible), or with the queue manager performing an extra level of
multiplexing on each queue.   The queue manager receives events from
the NIC hardware signalling transmission and reception of packets. 

An additional domain, the network daemon (\texttt{netd}), is
responsible for non-application network traffic (such as ARP requests
and responses, and ICMP packets).  This daemon also handles allocation
of ports from talking to a given network interface. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The Barrelfish source tree}\label{chap:sourcetree}

\newenvironment{dirlist}{%
\let\olditem\item% 
\renewcommand\item[2][]{\olditem \texttt{##1}:\\[0.3\baselineskip]##2}%
\begin{description}}{\end{description}%
}

The Barrelfish source tree is organized as follows:
\begin{dirlist}
\item[capabilities/] Definitions of the capability type system,
  written in the Hamlet language.
\item[devices/] Definitions of hardware devices, written in the
  Mackerel language. 
\item[doc/] \LaTeX Source code for the Barrelfish Technical Notes, including
  this one.
\item[errors/] Definitions of Barrelfish system error codes, written
  in the Fugu language. 
\item[hake/] Source code for the Hake build tool.
\item[if/] Definitions of Barrelfish message-passing interface types,
  written in the Flounder language. 
\item[include/] Barrelfish general header files.
\item[kernel/] Source code for Barrelfish CPU drivers.
\item[kernel/include/] Header files internal to Barrelfish CPU drivers.
\item[lib/] Source code for libraries. 
\item[tools/] Source code for a variety of tools used during the build
  process. 
\item[trace\_definitions/] Constant definitions for use by the
  Barrelfish tracing infrastructure, written in the Pleco language. 
\item[usr/] Source code for Barrelfish binaries. 
\item[usr/drivers/] Source code for Barrelfish device driver binaries.
\end{dirlist}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The Barrelfish build tree}\label{chap:buildtree}

Successfully building Barrelfish results in a series of files and
directories in the build directory.  Regardless of the architectures
requested, the following are likely to be present:

\begin{dirlist}
\item[docs/] PDF files corresponding to the Barrelfish technical
  notes, including this one.  Note that these files are probably not
  built by default, you make have to invoke \texttt{make docs}. 
\item[hake/] Build files and binary for \texttt{hake}, the Barrelfish
  build tool.  You should not need to look in here unless you need to
  edit \texttt{Config.hs}, the system-wide configuration file.
\item[Hakefiles.hs] A very large Haskell source file containing all
  the Hakefiles in the system, concatenated together along with some
  dynamic information on the files in the build tree.  You should not
  need to refer to this file unless you encounter a bug in Hake.
\item[Makefile] A very large Makefile, which contains explicit rules
  to build every file (including intermediate results) in the
  Barrelfish build tree.  It is sometimes useful to search through
  this file if the build is failing in some unexpected way.  The only
  files this Makefile includes are \texttt{symbolic\_targets.mk} and
  generated C dependency files; all other make information is in this
  one file.
\item[menu.lst] An initial file for booting Barrelfish via grub. 
\item[skb\_ramfs.cpio.gz] A RAM filing system image in the form of a
  CPIO archive containing support Prolog files for the Barrelfish
  System Knowledge Base (SKB). 
\item[sshd\_ramfs.cpio.gz]  A RAM filing system image in the form of a
  CPIO archive containing support files for the Barrelfish port of the
  OpenSSH server.
\item[symbolic\_targets.mk] A file containing symbolic \texttt{make}
  targets, since all rules in the main, generated Makefile are
  explicit.  This file contains the definitions necessary to build
  complete platform images of Barrelfish for various hardware
  architectures. 
\item[tools/bin/] Directory containing binaries (for the host system)
  of various tools needed to building Barrelfish, such as
  \texttt{flounder}, \texttt{fugu}, \texttt{hamlet}, and
  \texttt{mackerel}.
\item[tools/tools/] Intermediate object files for building the tools
  binaries
\item[tools/tmp/] Miscellaneous intermediate files, mostly from
  building Technical Note PDF files.
\end{dirlist}

In addition, there will be a directory for each architecture for which
this build was configured, such as \texttt{x86\_64}, \texttt{86\_32},
\texttt{ARMv7}, etc.  This will contain the following items of
interest (taking \texttt{x86\_64} as a concrete example):

\begin{dirlist}
\item[x86\_64/capabilities/] C code generated by Hamlet to encode the
  capability type system. 
\item[x86\_64/errors/] C code generated by Fugu to encode the core OS
  error conditions. 
\item[x86\_64/include/] Assorted include files, both generated and
  copied from the source tree. 
\item[x86\_64/include/dev] Device access function header files
  generated by Mackerel.
\item[x86\_64/include/if] Stub definition header files
  generated by Flounder.
\item[x86\_64/kernel/] Intermediate build files for all the CPU
  drivers built for this architecture.
\item[x86\_64/lib/] Static libraries for the system, together with
  intermediate build files for the libraries.
\item[x86\_64/sbin/] All the executable binaries built for this
  architecture.  The CPU driver is generally known as
  \texttt{/sbin/cpu}, or a similar name more specific to a given
  core. 
\item[x86\_64/tools/] Miscellaneous tools specific to a target
  architecture.  Typically ELF code, bootloaders, and code to
  calculate assembly offsets from C definitions. 
\item[x86\_64/trace\_definitions/] Generated files giving constants
  for the tracing system in C and JSON (for use by Aquarium2). 
\item[x86\_64/usr/] Intermediate build files for all the executable
  binaries whose source is in \texttt{/usr}. 
\end{dirlist}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{An overview of Barrelfish documentation}\label{chap:docs}

Barrelfish a rapidly-changing research operating system, and
consequently it is difficult and resource-intensive to maintain
coherent, up-to-date documentation at the same time.  However, efforts
are made to document the system in a number of ways, summarized below.

\section{Technical notes}

The Barrelfish source contains a number of technical notes, which are
rough-and-ready (and incomplete) documentation, tutorials, reference
manuals, etc. for the system.  The documents evolve over time, but are
often the best source of documentation for the system. 

This document is Barrelfish Technical Note 0.  The remaining technical
notes are as follows:
\begin{enumerate}
\item \textbf{Glossary:} A list of Barrelfish-specific terms with
  definitions. 
\item \textbf{Mackerel:} Reference manual for the Barrelfish language
  for specifying hardware registers and in-memory datastructure
  layouts. 
\item \textbf{Hake:} A comprehensive manual for the Barrelfish build
  utility. 
\item \textbf{Virtual Memory in Barrelfish \textit{obsolete}:} A brief
  description of virtual memory support in Barrelfish, now superceded
  by Simon Gerber's dissertation ``Virtual Memory in a Multikernel''. 
\item \textbf{Barrelfish on the Intel Single-chip Cloud Computer:} A
  description of the Barrelfish support for the SCC (Rock Creek)
  processor, together with some performance evaluations and discussion
  of the hardware. 
\item \textbf{Routing in Barrelfish:} An early design for routing
  inter-core messages over multiple hops within the system.
\item \textbf{Tracing and Visualization:} The Barrelfish trace
  infrastructure and associated tools.
\item \textbf{Message notifications:} A discussion of the use of notification
  drivers to mitigate the effect of polling when using UMP message
  channels. 
\item \textbf{Specification} A document specifying the Barrelfish API and the core kernel implementation.
\item \textbf{Inter-dispatcher communication in Barrelfish:}
  Documentation for the IDC system, including the Flounder interface
  definition language, and some of the more commonly used interconnect
  drivers and message transports.
\item \textbf{Barrelfish OS Services:} A list of services running in a
  typical Barrelfish machine, together with their interdependencies.
\item \textbf{Capability Management in Barrelfish:} Documentation
  for the user-level capability primitives for maniulating caprefs and
  CNodes. 
\item \textbf{Bulk Transfer:} A proposed design for transferring large
  contiguous areas of memory between Barrelfish domains.
\item \textbf{A Messaging interface to disks:} A comprehensive
  description of the Barrelfish AHCI driver. 
\item \textbf{Serial ports:} A discussion of how serial ports are
  represented in Barrelfish, both inside the CPU driver and from user
  space. 
\end{enumerate}

Barrelfish technical notes are built from the Barrelfish tree (using
\texttt{make Documentation}), and are also available pre-built from the
Barrelfish web site at \url{http://www.barrelfish.org/}.

\section{The Barrelfish Wiki}

The Barrelfish project public wiki (at
\url{http://wiki.barrelfish.org/}) contains a variety of additional
documentation, including many contributions from users outside the
core Barrelfish team.

\section{Academic publications}

Many publications related to Barrelfish can be found on the Barrelfish
web site.  They are not generally detailed documentation for the
system, but often convey high-level design decisions, concepts, and
rationales.

\section{Doxygen}

The Barrelfish libraries are annotated for use with Doxygen. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliographystyle{abbrv}
\bibliography{barrelfish}

\end{document}