1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2% Copyright (c) 2013-2016, ETH Zurich.
3% All rights reserved.
4%
5% This file is distributed under the terms in the attached LICENSE file.
6% If you do not find this file, copies can be found by writing to:
7% ETH Zurich D-INFK, Universitaetstr. 6, CH-8092 Zurich. Attn: Systems Group.
8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
9
10\documentclass[a4paper,twoside]{report} % for a report (default)
11
12\usepackage{bftn} % You need this
13\usepackage{multirow}
14\usepackage{listings}
15\usepackage{color}
16\usepackage{xspace}
17
18\title{Barrelfish on ARMv7-A}   % title of report
19\author{Simon Gerber \and Stefan Kaestle \and Timothy Roscoe \and
20  Pravin Shinde \and Gerd Zellweger}
21\tnnumber{017}  % give the number of the tech report
22\tnkey{ARMv7-A} % Short title, will appear in footer
23
24% \date{Month Year} % Not needed - will be taken from version history
25
26\newcommand{\todo}[1]{\note{\textbf{TODO:} #1}}
27
28\begin{document}
29\maketitle
30
31\newcommand{\code}[1]{{\lstinline!#1!}}
32\newcommand{\file}[1]{{\lstinline!#1!}}
33\newcommand{\mode}[1]{\texttt{#1} mode\xspace}
34
35%configure listings properly
36\lstset{%
37  basicstyle=\small\ttfamily,
38  escapechar=@
39}
40
41%
42% Include version history first
43%
44\begin{versionhistory}
45\vhEntry{0.1}{05.12.2013}{SK}{Initial version}
46\vhEntry{0.2}{08.12.2015}{TR}{Rewritten for new ARMv7 code}
47\vhEntry{1.0}{31.05.2016}{TR}{Newly-factored ARMv7 platform support}
48\end{versionhistory}
49
50% \intro{Abstract}		% Insert abstract here
51% \intro{Acknowledgements}	% Uncomment (if needed) for acknowledgements
52\tableofcontents		% Uncomment (if needed) for final draft
53% \listoffigures		% Uncomment (if needed) for final draft
54% \listoftables			% Uncomment (if needed) for final draft
55
56\lstset{
57  language=C,
58  basicstyle=\ttfamily \small,
59  flexiblecolumns=false,
60  basewidth={0.5em,0.45em},
61  boxpos=t,
62}
63
64\newcommand{\eclipse}{ECL\textsuperscript{i}PS\textsuperscript{e}\xspace}
65\newcommand{\codesize}{\scriptsize}
66\newcommand{\note}[1]{[\textcolor{red}{\emph{#1}}]}
67
68%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
69\chapter{Introduction}
70
71This document describes the state of support for ARMv7-A processors in
72Barrelfish.
73
74ARM hardware is highly diverse, and has evolved over time.  As a
75research OS, Barrelfish focusses ARM support on a small number of
76platforms based on wide availability, ease of maintenance, and
77research interest.   However, since management of hardware complexity
78and diversity is also a research goal of the Barrelfish project, we
79aim to make it easy to add new ARM-based platforms with a mixture of
80traditional and non-traditional engineering techniques. 
81
82The principal processors with 32-bit ARM support in Barrelfish at present are
83ARMv7-A (Cortex A-series), in particular the Cortex A9. 
84
85Past support for older ARM 32-bit architectures in Barrelfish included:
86\begin{itemize}
87\item ARMv7m (Cortex M-series), in particular the Cortex M3. 
88\item ARMv5 processors, in particular the Intel iXP2800 network
89  processor (which uses an XScale core). 
90\item ARMv6 (ARM11MP) processors running under simulation in
91  \file{qemu}. 
92\end{itemize}
93
94The main 32-bit ARM-based systems we target at present are:
95\begin{itemize}
96\item The Texas Instruments OMAP4460 SoC used in the Pandaboard ES
97  platform. 
98\item The ARM VExpress\_EMM board, under emulation in the GEM5
99  simulator. 
100\end{itemize}
101
102%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
103\chapter{Compilation}
104\label{sec:armcompile}
105
106Building Barrelfish with ARMv7 is straightforward; detailed
107requirements for packages are described in the latest README file.
108
109Compiling ARM support in Barrelfish requires a cross-compilation
110toolchain on the programmers \code{PATH}.  For ARMv7 support we
111track the GNU toolchain shipped with Ubuntu LTS  (14.04.3 at time of
112writing). 
113
114Once you have the right tools, run hake with the correct options,
115e.g.:
116\begin{lstlisting}
117$ cd /build/barrelfish
118$ /git/barrelfish/hake/hake.sh -a armv7 -s /git/barrelfish 
119...
120$
121\end{lstlisting}
122
123After running \code{hake} with appropriate architecture support
124(i.e. use \code{-a armv7}), you can ask the Makefile what platforms it
125supports:
126
127\begin{lstlisting}
128$ make help-platforms
129------------------------------------------------------------------
130Platforms supported by this Makefile.  Use 'make <platform name>':
131 (these are the platforms available with your architecture choices)
132
133 Documentation:
134	 Documentation for Barrelfish
135 PandaboardES:
136	 Standard Pandaboard ES build image and modules
137 ARMv7-GEM5:
138	 GEM5 emulator for ARM Cortex-A series multicore processors
139------------------------------------------------------------------
140$ 
141\end{lstlisting}
142
143Then build:
144
145\begin{lstlisting}
146$ make -j 8 PandaboardES
147\end{lstlisting}
148
149\section{Building for GEM5}
150
151To boot Barrelfish in GEM5, in addition to the previous steps you
152will need a supported version of GEM5.  The GEM5 website
153(\url{gem5.org}) has comprehensive information. 
154
155Unfortunately, different
156versions of GEM5 manifest different subtle bugs when emulating ARM
157systems.  We recommend revision 0fea324c832c of GEM5 at present;
158please let us know if you find a more recent version that works well. 
159
160To fetch and build GEM5 on Ubuntu LTS:
161
162\begin{lstlisting}
163$ sudo apt-get install scons swig python-dev libgoogle-perftools-dev m4 protobuf-compiler libprotobuf-dev
164$ hg clone http://repo.gem5.org/gem5 -r 0fea324c832c gem5
165adding changesets
166adding manifests
167adding file changes
168added 9356 changesets with 53499 changes to 6576 files
169updating to branch default
1703269 files updated, 0 files merged, 0 files removed, 0 files unresolved
171$ cd ./gem5 
172$ scons build/ARM/gem5.fast
173...
174
175$
176\end{lstlisting}
177
178GEM5 is a large system and may take some time to build.  In addition,
179you may have to install minor fixes to ensure compilation (I had to
180add some initializers to \file{mem/ruby/network/orion/Wire.cc}, for
181example). 
182
183After the compilation of GEM5 is finished, add the binary to your PATH.
184
185Now, build Barrelfish like this:
186\begin{lstlisting}
187$ make -j 8 ARMv7-GEM5
188\end{lstlisting}
189
190It's a good idea to set \code{armv7_platform} in
191\file{<build_dir>/hake/Config.hs} to \texttt{gem5} in order to enable
192the cache quirk workarounds for GEM5 and proper offsets for the
193platform simulated by GEM5.
194
195You can also build Barrelfish and boot inside GEM5 in a single step:
196
197\begin{lstlisting}
198$ make help-boot
199------------------------------------------------------------------
200Boot instructions supported by this Makefile.  Use 'make <boot name>':
201 (these are the targets available with your architecture choices)
202
203 gem5_armv7:
204	 Boot an ARMv7a multicore image in GEM5
205 gem5_armv7_detailed:
206	 Boot an ARMv7a multicore image in GEM5 using a detailed CPU model
207$ make gem5_armv7
208...
209\end{lstlisting}
210
211To get the output of Barrelfish you should:
212\begin{lstlisting}
213$ telnet localhost 3456
214\end{lstlisting}
215
216GEM5 is a highly configurable simulator.  You can print the supported
217options of the GEM5 script as follows:
218
219\begin{lstlisting}
220$ gem5.fast gem5/gem5script.py -h
221\end{lstlisting}
222
223Note that if you boot using \code{make arm_gem5_detailed} rather than
224\code{make arm_gem5}, the simulation takes a long time (depending on
225your machine up to an hour just to boot Barrelfish).
226 
227%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
228\chapter{Hardware assumptions and limitations}
229
230The current state of ARMv7 support in Barrelfish makes a number of
231assumptions about the underlying hardware platform, and also imposes
232some limitations.  We discuss these here.
233
234\section{No support for Large Physical Address Extensions}
235
236The current Barrelfish design does not support LPAE for 32-bit ARM
237processors.  Instead, it assumes a 32-bit physical address space.
238Supporting LPAE would require changes to the paging code, but would
239also require a mechanism to address user memory from the kernel
240effectively (see below). 
241
242\section{Physical RAM starts at 2GB}
243
244Within the 32-bit physical address space, RAM is assumed to start at
245the 2GB boundary (i.e. \code{0x80000000}).  This is the
246architectural recommendation for Cortex-A series processors, and we
247have yet to encounter non-LPAE ARMv7-A hardware which does not do
248this.  Changing this assumption in the code should be possible, but in
249practice is likely to be dominated by the other limitations mentioned
250here. 
251
252\section{Physical RAM is limited to 1GB}
253
254The Barrelfish ARMv7 CPU drivers can handle up to 1GB RAM,
255contiguously situated in the physical address space starting at 2GB.
256This limit could be raised by half a Gigabyte or so, at the cost of
257space for mapping kernel devices.  In practice, the CPU does not need
258to map many kernel devices since most drivers run in user space on
259Barrelfish.   Consequently, the allocation of the top 2GB of the
260virtual address space betwen 1-1 mapped RAM and kernel hardware
261devices could easily be moved. 
262
263However, it remains that the total RAM visible to the CPU \emph{plus}
264the mappings for any devices needed by the CPU driver must fit into
265the top 2GB of the address space (mapped by the TTBR1 register).   
266
267In particular, the CPU driver assumes that all physical RAM is mapped
2681-1, and relies on this when performing capability invocations.   If
269the system had more RAM that could be mapped 1-1 into kernel virtual
270address space, we would need a method for the CPU driver to quickly
271access arbitrary physical addresses, entailing some kind of paging
272system.  
273
274
275%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
276\chapter{Organization of the address space}
277
278Like many other popular operating systems, Barrelfish employs a memory
279split. The idea behind a memory split is to separate kernel code from
280user space code in the virtual address space. This allows the kernel
281to be mapped in every virtual address space of each user space
282program, which is necessary to allow user space code to access kernel
283features through the system call interface. If the kernel was not
284mapped into the virtual address space of each program, it would be
285impossible to jump to kernel code without switching the virtual
286address space. 
287
288\begin{figure}[htb]
289  \centering
290  \includegraphics[width=8cm]{figures/virtual_addressing.pdf}
291  \caption{Barrelfish virtual address space layout for ARMv7-A}
292  \label{fig:memory_layout}
293\end{figure}
294
295Additionally ARMv7-A provides two translation table
296base registers, TTBR0 and TTBR1. We configure the system to use
297TTBR0 for address translations of virtual addresses below 2GB and
298TTBR1 for virtual address above 2GB. This saves us the explicit
299mapping of the kernel pages into every L1 page table of each process.
300Even though the kernel is mapped to each virtual address space, it is
301invisible for the user space program. Accessing memory, which belongs
302to the kernel, leads to a pagefault. Since many mappings can point to
303the same physical memory, memory usage is not increased by this
304technique.
305
306Figure~\ref{fig:memory_layout} shows the memory layout of the complete
307virtual address space of a single ARMv7-A core running Barrelfish. 
308
309We have a memory split at 2GB, where everything upwards is only
310accessible in privileged modes and the lower 2GB of memory is
311accessible for user space programs. 
312
313The kernel runs out of the kernel virtual address space where system
314RAM is mapped 1-1; in the region between \texttt{0x80000000} and
315\texttt{0xC0000000} RAM is mapped directly physical-to-virtual.
316
317The L1 page table of the kernel address space is located inside the
318data segment of the kernel right after the
319kernel and naturally aligned to 16KB.  
320
321We map the whole available physical memory into the kernel���s virtual
322address space using ``sections'' (1MB large pages), obviating the need
323for a kernel L2 page table. 
324
325Above \texttt{0xC0000000}, the CPU driver maps regions of physical
326memory corresponding to hardware devices it needs to directly access
327(typically the UARTs, interrupt controller, timers, Snoop Control
328Unit, and a few others).  These are also mapped using sections.
329Virtual address regions are allocated in 1MB increments (the size of a
330section mapping) working down from the top section, which is used to
331map the area of RAM containing the CPU driver's exception vectors. 
332
333Below the \texttt{0x80000000}, all mappings are handled by TTBR0 and
334changed on every context switch.  At startup, the kernel uses another
335page table (also 16kB-aligned and located inside its data segment) to
336map low memory virtual-to-physical as well, as a way to access
337hardware devices in this region before the rest of the system has come
338up.  However, after the early stages of bootstrap this table is no
339longer used. 
340
341Instead, TTBR0 is always loaded with the address of a user domain's
342hardware page table and changes on a context switch.  TTBR1 does not
343change, ensuring the kernel mappings are static after boot.
344
345%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
346\chapter{Boot sequence}
347
348\section{BSP (initial) core}
349
350\begin{enumerate}
351\item \file{boot.S:start} is called by the bootloader.  It
352  sets the processor \mode{System}, sets up the (single) kernel
353  stack, the global object table pointer, and jumps to
354  \file{arch_init}.
355\item \file{init.c:arch_init} is called with a single
356  argument: the address of the multiboot info block.  It first
357  initializes the serial console \file{serial_early_init} and
358  checks to see if this is the BSP.  If so, it calls
359  \file{bsp_init}. 
360\item \file{init.c:bsp_init} reads information from the multiboot
361  info into the global data structure, initializing it.  It also
362  resets global spinlocks, and sizes RAM (though this information is
363  not yet used).  It returns.
364\item \file{init.c:arch_init} continues by initialzing paging,
365  calling:
366\item \file{paging.c:paging_init} populates the two initial page
367  tables (one for each base register).  The kernel (upper) page table
368  is initialized to map 1GB of RAM at 0x80000000, and the exception
369  vectors at the top of memory.  The initial user (lower) page table
370  is set to map the lower 2GB of the physical address space 1-1 to
371  enable early device access.  The MMU is then enabled.
372\item \file{init.c:arch_init} continues with the MMU enabled by
373  jumping at: 
374\item \file{init.c:arch_init_2} which initializes exceptions,
375  relocating the current KCB, parses the command line arguments, and
376  re-initializes the serial ports so that the UART hardware is now
377  mapped correctly into kernel address space with a section mapping. 
378
379  It then initializes the GIC, the Snoop Control Unit, the Global
380  Timer, and the Time Slice Counter.  Cycle counter access from
381  \mode{User} is enabled, and the coreboot spawn handler set up.  It
382  then calls:
383
384\item \file{startup_arch.c:arm_kernel_startup} which initializes
385  a simple memory allocator from the global structure, allocates the
386  a new KCB, and calls:
387
388\item \file{startup_arch.c:spawn_bsp_init} which creates the
389  initial kernel data structures for spawning the init process.  It
390  also creates the initial capabilities for init to use to allocate
391  memory, and returns. 
392
393\item \file{startup_arch.c:arm_kernel_startup} continues
394  but calling \code{dispatch} on the init DCB, and we are now up and
395  running. 
396\end{enumerate}
397
398%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
399\chapter{Exception code paths}
400
401ARMv7-A exceptions are initialized in
402\file{exceptions.S:exceptions_init}, which for some reason is
403written in assembly.  It assumes the core is running in \mode{System}. 
404
405There is a 256-byte statically-allocated stack for each exception
406mode, and an 8kB stack used for subsequently calling into C in
407\mode{System}, all defined in \file{exceptions.S}. 
408
409Most exception handlers in the vector table start by checking whether
410the processor was in \mode{User} or not when the trap happened.  In most
411cases, if the processor was not in \mode{User}, the result is that
412\mode{System} is entered and the processor jumps to
413\file{exn.c:fatal_kernel_fault}, which panics.   Exceptions to this
414rule are noted below. 
415
416For exceptions taken while the processor is in \mode{User}, the
417address of the current (user space) dispatcher is loaded (macro \\
418\code{get_dispatcher_shared_arm}), and a check is made to see if the
419dispatcher is ``enabled'' (in other words, whether the dispatcher
420should be upcalled when next dispatched).   
421
422This latter check is performed by the macro \code{disp_is_disabled},
423and returns non-zero if:
424\begin{enumerate}
425\item The \code{disabled} value in the dispatcher (at offset
426  \code{OFFSET_OF_DISP_DISABLED}) is non-zero, \emph{or}
427\item The PC lies between the two values in the dispatcher with
428  offsets \code{OFFSETOF_DISP_CRIT_PC_LOW} and
429  \code{OFFSETOF_DISP_CRIT_PC_HIGH}\footnote{A trick suggested by
430    Justin Cappos to allow an atomic resume of a user-level thread
431    without entering the kernel}. 
432\end{enumerate}
433
434Depending on this, context is saved in a different area of the
435dispatcher, \mode{System} is entered, and a call is made to C code as
436noted below. 
437
438Taking each exception in turn:
439
440\section{Reset exception}
441
442This is vector 0x00, and is not used. 
443
444\section{Undefined Instruction exception}
445
446This is vector offset 0x04, and is referred to as 
447\code{ARM_EVECTOR_UNDEF} in the source.   
448The processor enters \code{undef_handler} in \mode{Undefined}.
449Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area.
450C is entered at \code{exn.c:handle_user_undef}. 
451
452\section{Supervisor call (software interrupt)}
453
454This is vector offset 0x08, and referred to as
455\code{ARM_EVECTOR_SWI} in the source. 
456The processor enters \code{swi_handler} in \mode{Supervisor}.
457
458If the syscall was issued from user space, context is saved in either
459the \code{ENABLED} or \code{DISABLED} area.  C is entered at
460\code{syscall.c:sys_syscall}.  
461
462If the syscall was issued from kernel space, no context is saved and
463C is entered at \code{syscall.c:sys_syscall_kernel}. 
464
465\section{Prefetch Abort exception}
466
467This is vector offset 0x0C, and referred to as
468\code{ARM_EVECTOR_PABT} in the source. 
469The processor enters \code{pabt_handler} in \mode{Abort}.
470
471Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area.
472C is entered at \code{exn.c:handle_user_page_fault}. 
473
474\section{Data Abort exception}
475
476This is vector offset 0x10, and referred to as
477\code{ARM_EVECTOR_DABT} in the source. 
478The processor enters \code{dabt_handler} in \mode{Abort}.
479
480Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area.
481C is entered at \code{exn.c:handle_user_page_fault} with the faulting
482address in \code{r0}. 
483
484\section{Hyp Trap, or Hyp mode entry}
485
486This is vector offset 0x14, and is not used in Barrelfish.
487
488\section{IRQ interrupt}
489
490This is vector offset 0x18, and referred to as
491\code{ARM_EVECTOR_IRQ} in the source. 
492The processor enters \code{irq_handler} in \mode{IRQ}. 
493
494If the syscall was issued from user space, context is saved in either
495the \code{ENABLED} or \code{DISABLED} area.  C is entered at
496\code{exn.c:handle_irq}.  
497
498If the syscall was issued from kernel space, context is saved in
499\code{irq_save_area}, \mode{System} is entered, and C is called at
500\code{exn.c:handle_irq}. 
501
502\section{Fast interrupt}
503
504This is vector offset 0x1C, and referred to as
505\code{ARM_EVECTOR_FIQ} in the source. 
506The processor enters \code{fiq_handler} in \mode{FIQ}. 
507
508If the syscall was issued from user space, context is saved in either
509the \code{ENABLED} or \code{DISABLED} area.  C is entered at
510\code{exn.c:handle_irq} (as for IRQ).
511
512If the syscall was issued from kernel space, context is saved in
513\code{irq_save_area}, \mode{System} is entered, and C is called at
514\code{exn.c:handle_irq} (as for IRQ).
515
516%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
517\chapter{The Dispatch mechanism}
518
519Each time a CPU driver decides to switch to running a domain, it
520dispatches the domain in one of two ways:
521
522\begin{description}
523\item[RESUME], also known as ``disabled'': in this mode, the domain is
524  resumed exactly where it was preempted before, much as in operating
525  systems like Unix. 
526\item[UPCALL], also known as ``enabled'': as with Scheduler
527  Activations, the domain is upcalled at a fixed address with a new
528  context on a small, dedicated stack.  The context of the
529  previously-running thread in teh domain is available to be resumed
530  in user space, if the user-level scheduler (also known as the
531  activation handler) decides to. 
532\end{description}
533
534Which one of these happens depends on the state of the domain.  
535
536When a domain is running in user space (i.e. the kernel is \emph{not}
537executing) the domain is in one of two states, indicated by a
538combination of:
539\begin{itemize}
540\item the \code{disabled} field of the \code{struct
541  dispatcher_shared_generic} structure,
542\item the current program counter,
543\item the \code{crit_pc_low} and \code{crit_pc_high} fields of the \code{struct
544  dispatcher_shared_generic} structure.
545\end{itemize}
546
547Note that all of these values can be written by the user program. 
548
549Specifically, the domain is in \code{RESUME} state \emph{iff}:
550\begin{enumerate}
551\item \code{disabled} is \code{true}, \emph{or}
552\item the current program counter lies between \code{crit_pc_low} and
553  \code{crit_pc_high} 
554\end{enumerate}
555
556Otherwise, it is in state \code{UPCALL}.  
557
558Once the kernel is entered, the \code{disabled} flag of the domain's
559\code{struct dcb} structure (as opposed to the \code{struct
560  dispatcher_shared_generic}) is updated to reflect the state of the
561preempted domain. 
562
563
564%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
565\chapter{Key data structures}
566
567\begin{itemize}
568\item \code{struct dcb}: in \code{kernel/include/dispatch.h};
569  the main domain control block.   
570
571  \code{dcb_current} is a global pointer in the CPU driver that points
572  to the current DCB. 
573  
574  If \code{dp} is of type \core{struct dcb *}, then
575  \code{dp->disabled} is a flag which is 1 if the current DCB has
576  activations disabled (i.e. it should be resumed when next scheduled
577  to run) and 0 otherwise (in which case it should be upcalled) - the
578  analogy is with enabling and disabling interrupts.   The flag is set
579  on entry to the kernel.
580
581\item \code{struct dispatcher_shared_generic}: in
582  \code{include/barrelfish_kpi/dispatcher_shared.h}: the
583  architecture-independent part of the a dispatcher, the user-space
584  datastructure corresponding to a DCB.   This is the first struct in
585  architecture-dependent variants, such as \code{struct
586    dispatcher_shared_arm}.
587
588  If \code{dp} is of type \core{struct dispatcher_shared_generic *}, then
589  \code{dp->disabled} is a flag which is 1 if the current DCB has
590
591\end{itemize}
592
593%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
594\chapter{Hardware abstraction layers}
595
596Barrelfish distinguishes between:
597\begin{itemize}
598\item General code
599\item Architecture-specific code (e.g. ARMv7-A code)
600\item Platform-specific code (e.g. code for the OMAP4460 SoC)
601\end{itemize}
602
603Since most Barrelfish device drivers run in userspace, the difference
604between ``platform'' as a chip (such as the OMAP4460) and ``platform''
605as a board or complete machine (such as the PandaBoard ES) are
606relatively unimportant inside the CPU driver, since most of the
607platform-specific CPU driver code is actually specific to a chip or
608SoC.
609
610Barrelfish CPU driver source code for ARMv7-A systems therefore
611consists of the following categories:
612\begin{itemize}
613\item Portable, architecture-independent code.
614\item ARMv7-A-specific code which common to all ARMv7-A platforms
615\item Code for particular devices or macrocells which are only used on
616  ARMv7-A, but might appear on multiple ARMv7-A platforms.
617\item Platform-specific code. 
618\end{itemize}
619
620We restrict platform-specific code to a single source file, which
621roughly corresponds to ARM's concept of an ``integrator'', and acts as
622a compilation-time indirection layer between commmon ARMv7-A-specific
623code and individual device and macrocell drivers.  
624
625\section{The ARMv7-A HAL}
626
627Platform code for a Barrelfish ARMv7-A CPU driver must implement the
628following interfaces:
629
630\begin{description}
631\item[serial.h]: Low-level drivers for a multiple UART devices.
632\item[spinlock.h]: Some number of static spinlocks, used for
633  coordinating access to e.g. serial devices between CPU drivers on
634  different cores. 
635\end{description}
636
637
638
639%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
640\chapter{Code organization}
641
642The variety of ARM platforms make organizing source trees to maximise
643code reuse across different platforms a challenge. 
644
645Barrelfish distinguishes between \emph{Architectures}, which are
646typically processor architectures like ``ARMv7-A'', and \emph{Platforms},
647which are complete system targets, like ``PandaBoard-ES''. 
648
649Code and headers specific to a particular architecture are found in
650the source tree is various subdirectories of the form
651\file{../arch/armv7/}.  
652
653%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
654\chapter{Versatile Express platform}
655
656%--------------------------------------------------
657\chapter{GEM5 specifics}
658
659The GEM5~\cite{gem5:sigarch11} simulator combines the best aspects of
660the M5~\cite{m5:micro06} and GEMS~\cite{gems:sigarch05}
661simulators. With its flexible and highly modular design, GEM5 allows
662the simulation of a wide range of systems. GEM5 supports a wide range
663of ISAs like x86, SPARC, Alpha and, in our case most importantly,
664ARM. In the following we will list some features of GEM5.
665
666GEM5 supports four different CPU models: AtomicSimple, TimingSimple,
667In-Order and O3. The first two are simple one-cycle-per-instruction
668CPU models. The difference between the two lies in the way they handle
669memory accesses. The AtomicSimple model completes all memory accesses
670immediately, whereas the TimingSimple CPU models the timing of memory
671accesses. Due to their simplicity, the simulation speed is far above
672the other two models.  The InOrder CPU models an in-order pipeline and
673focuses on timing and simulation accuracy. The pipeline can be
674configured to model different numbers of stages and hardware threads.
675The O3 CPU models a pipelined, out-of-order and possibly superscalar
676CPU model. It simulates dependencies between instructions, memory
677accesses, pipeline stages and functional units. With a load/store
678queue and reorder buffer its possible to simulate superscalar
679architectures as well as multiple hardware threads.
680
681The GEM5 simulator provides a tight integration of Python into the
682simulator. Python is mainly used for system configuration. Every
683simulated building block of a system is implemented in C++ but are
684also reflected as a Python class and derive from a single superclass
685SimObject. This provides a very flexible way of system construction
686and allows to tailor nearly every aspect of the system to our needs.
687Python is also used to control the simulation, taking and restoring
688snapshots as well as all the command line processing.
689
690We use a VExpress\_EMM based system to run Barrelfish. The number of
691cores can be passed as an argument to the GEM5 script. Cores are
692clocked at 1 GHz and main memory is 64 MB starting at 2 GB.
693
694\section{Boot process: first (bootstrap) core}
695
696% Source: Samuel's thesis, 4.1.1
697
698This section gives a high-level overview of the boot up process of the
699Barrelfish
700kernel on ARMv7-a. In subsequent sections we will go more into details
701involved
702in the single steps.
703\begin{enumerate}
704\item Setup kernel stack and ensure privileged mode
705\item Allocate L1 page table for kernel
706\item Create necessary mappings for address translation
707\item Set translation table base register (TTBR) and domain
708  permissions
709\item Activate MMU, relocate program counter and stack pointer
710\item Invalidate TLB, setup arguments for first C-function arch init
711\item Setup exception handling
712\item Map the available physical memory in the kernel L1 page table
713\item Parse command line and set corresponding variables
714\item Initialize devices
715\item Create a physical memory map for the available memory
716\item Check ramdisk for errors
717\item Initialize and switch to init���s address space
718\item Load init image from ramdisk into memory
719\item Load and create capabilities for modules defined by menu.lst
720\item Start timer for scheduling
721\item Schedule init and switch to user space
722\item init brings up the monitor and mem serv
723\item monitor spawns ramfsd, skb and all the other modules
724\end{enumerate}
725
726\section{Boot process: subsequent cores}
727
728% Source: Samuel, 4.2.2
729
730The boot up protocol for the multi-core port differs in various ways
731from the boot up procedure of our previous single-core port. We
732therefore include this revised overview here. The first core is called
733the bootstrap processor and every subsequent core is called an
734application processor On bootstrap processor:
735
736\begin{enumerate}
737\item Pass argument from bootloader to first C-function arch
738  init 18
739\item Make multiboot information passed by bootloader globally
740  available
741\item Create 1:1 mapping of address space and alias the same region at
742  high memory
743\item Configure and activate MMU
744\item Relocate kernel image to high memory
745\item Reset mapping, only map in the physical memory aliased at high
746  memory
747\item Parse command line and set corresponding variables
748\item Initialize devices
749\item Initialize and switch to init���s address space
750\item Load init image into memory
751\item Create capabilities for modules defined by the multiboot info
752\item Schedule init and switch to user space
753\item init brings up the monitor and mem serv
754\item monitor spawns ramfsd, skb and all the other modules
755\item spawnd parses its cmd line and tells the monitor to bring up a
756  new core
757\item monitor setups inter-monitor communication channel
758\item monitor allocates memory for new kernel and remote monitor
759\item monitor loads kernel image and relocates it to destination
760  address
761\item monitor setups boot information for new kernel
762\item spawnd issues syscall to start new core
763\item Kernel writes entry address for new core into SYSFLAG registers
764\item Kernel raises software interrupt to start new core
765\item Kernel spins on pseudo-lock until other kernel releases it
766\item repeat steps 15 to 23 for each application processor
767\end{enumerate}
768
769
770%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
771\chapter{OMAP44xx platform}
772
773% Source: Claudio 3.1
774
775The OMAP4460 is a system on a chip (SoC) by Texas Instruments,
776intended for use in consumer devices like smartphones and tablet
777computers. It contains:
778
779\begin{itemize}
780\item A dual core ARM Cortex-A9 processor
781\item Two ARM Cortex-M3 processors
782\item A hardware spinlock module
783\item A mailbox module
784\item Many devices to process media input and output
785\end{itemize}
786
787The intention is that the Cortex-A9 will be running a general purpose
788operating system, while the Cortex-M3 processors will only be running
789a real-time operating system to control the imaging subsystem.
790
791The processor configuration in the OMAP4460 is somewhat
792unconventional; for example, the Cortex-M3 processors share a
793custom MMU with page faults handled by code running on the Cortex-A9
794processors and hence are constrained to run in the same virtual
795address at all times.  They are also not cache-coherent with the
796Cortex-A9 cores. 
797
798\section{Compiling and booting}
799
800To compile Barrelfish for the Pandaboard, first configure your
801toolchain as described in Section~\ref{sec:armcompile}. Then execute: 
802
803\begin{lstlisting}
804cd @\shell@SRC
805mkdir build
806cd build
807../hake/hake.sh -a armv7 -s ../
808make pandaboard_image
809\end{lstlisting}
810
811The resulting image can be booted on the Pandaboard over the USB OTG
812connector using the standard \texttt{usbboot} utility.  It will
813generate console output on the Pandaboard's serial connector.
814
815\section{Booting the second OMAP A9 core}
816
817% source: AOS m6
818
819Here is a brief overview of how the bootstrapping process for the second core
820works: it waits for a signal from the BSP core (an interrupt), and when this
821signal is received, the application core will read an address from a well-
822defined register and start executing the code from this address.
823
824To boot the second core, one can write the address of
825a function to the register and send the inter-processor
826interrupt. Following are some pointers to the documentation to help
827understand the bootstrapping process in more detail:
828
829\begin{itemize}
830\item Section 27.4.4 in the OMAP44xx manual talks about the boot process for
831  application cores.
832\item Pages 1144 \textit{ff.} in the OMAP44xx manual have the register
833  layout for the registers that are used in the boot process of the
834  second core. 
835\end{itemize}
836
837Note that the Barrelfish codebase distinguishes between the BSP (bootstrap)
838processor and APP (application) processors. This distinction and naming
839originates from Intel x86 support where the BIOS will choose a
840distinguished BSP processor at start-up and the OS 
841is responsible for starting the rest of the processors (the APP
842processors). Although it works somewhat differently on 
843ARM, the naming convention is applicable here as well.
844
845Note also that the second core will start working with the MMU
846disabled, so is running in physical address space.  The bootstrapping
847code sets up a stack, initial page tables and an initial Barrelfish
848dispatcher.
849
850\section{Physical address space}
851
852At present, a temporary limitation in the core boot protocol means
853that running Barrelfish on both A9 cores requires static partitioning of
854the available RAM into two halves, with an independent memory server
855running on each core.   This is will fixed in a subsequent release. 
856
857\section{Interconnect driver}\label{sec:interconnect}
858
859Communication between A9 cores on the OMAP processor is performed
860using a variant of the CC-UMP interconnect driver, modified for the
86132-byte cache line size of the ARMv7 architecture.  A notification
862driver for inter-processor interrupts exists. 
863
864The OMAP4460 also has mailbox hardware which can be used by both the
865A9 and M3 cores.  Barrelfish support for this hardware is in
866progress. 
867
868\section{M3 cores}
869
870Barrelfish also has rudimentary support for running on both the A9 and
871M3 cores.  This is limited by the requirement that the M3 cores must
872run in the same virtual address space, and do not have a way to
873automatically change address space on a kernel trap.  For this reason,
874we only execute on a single M3 core at present. 
875
876Before the Cortex-M3 can start executing code, the following steps
877have to be taken by the Cortex-A9:
878
879\begin{enumerate}
880\item Power on the Cortex-M3 subsystem
881\item Activate the Cortex-M3 subsystem clock
882\item Load the image to be executed into memory
883\item Enable the L2 MMU
884\item Set up mappings for the loaded image in the L2 MMU (can be
885  written directly into the TLB)
886\item Write the first two entries of the vectortable (initial sp and
887  reset vector)
888\item Take the Cortex-M3 out of reset
889\end{enumerate}
890
891It is important to note that the Cortex-M3 is in a virtual address
892space from the very beginning, reading the vector table at virtual
893address 0. Inserting a 1:1 mapping for the kernel image greatly
894simplifies the bootstrapping of memory management on the Cortex-M3
895once it is running, because it needs to know the physical address of
896the page tables it sets up.
897
898
899%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
900\bibliographystyle{abbrv}
901\bibliography{defs,barrelfish}
902
903\end{document}
904
905\end{document}
906