1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2% Copyright (c) 2013-2016, ETH Zurich. 3% All rights reserved. 4% 5% This file is distributed under the terms in the attached LICENSE file. 6% If you do not find this file, copies can be found by writing to: 7% ETH Zurich D-INFK, Universitaetstr. 6, CH-8092 Zurich. Attn: Systems Group. 8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 9 10\documentclass[a4paper,twoside]{report} % for a report (default) 11 12\usepackage{bftn} % You need this 13\usepackage{multirow} 14\usepackage{listings} 15\usepackage{color} 16\usepackage{xspace} 17 18\title{Barrelfish on ARMv7-A} % title of report 19\author{Simon Gerber \and Stefan Kaestle \and Timothy Roscoe \and 20 Pravin Shinde \and Gerd Zellweger} 21\tnnumber{017} % give the number of the tech report 22\tnkey{ARMv7-A} % Short title, will appear in footer 23 24% \date{Month Year} % Not needed - will be taken from version history 25 26\newcommand{\todo}[1]{\note{\textbf{TODO:} #1}} 27 28\begin{document} 29\maketitle 30 31\newcommand{\code}[1]{{\lstinline!#1!}} 32\newcommand{\file}[1]{{\lstinline!#1!}} 33\newcommand{\mode}[1]{\texttt{#1} mode\xspace} 34 35%configure listings properly 36\lstset{% 37 basicstyle=\small\ttfamily, 38 escapechar=@ 39} 40 41% 42% Include version history first 43% 44\begin{versionhistory} 45\vhEntry{0.1}{05.12.2013}{SK}{Initial version} 46\vhEntry{0.2}{08.12.2015}{TR}{Rewritten for new ARMv7 code} 47\vhEntry{1.0}{31.05.2016}{TR}{Newly-factored ARMv7 platform support} 48\end{versionhistory} 49 50% \intro{Abstract} % Insert abstract here 51% \intro{Acknowledgements} % Uncomment (if needed) for acknowledgements 52\tableofcontents % Uncomment (if needed) for final draft 53% \listoffigures % Uncomment (if needed) for final draft 54% \listoftables % Uncomment (if needed) for final draft 55 56\lstset{ 57 language=C, 58 basicstyle=\ttfamily \small, 59 flexiblecolumns=false, 60 basewidth={0.5em,0.45em}, 61 boxpos=t, 62} 63 64\newcommand{\eclipse}{ECL\textsuperscript{i}PS\textsuperscript{e}\xspace} 65\newcommand{\codesize}{\scriptsize} 66\newcommand{\note}[1]{[\textcolor{red}{\emph{#1}}]} 67 68%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 69\chapter{Introduction} 70 71This document describes the state of support for ARMv7-A processors in 72Barrelfish. 73 74ARM hardware is highly diverse, and has evolved over time. As a 75research OS, Barrelfish focusses ARM support on a small number of 76platforms based on wide availability, ease of maintenance, and 77research interest. However, since management of hardware complexity 78and diversity is also a research goal of the Barrelfish project, we 79aim to make it easy to add new ARM-based platforms with a mixture of 80traditional and non-traditional engineering techniques. 81 82The principal processors with 32-bit ARM support in Barrelfish at present are 83ARMv7-A (Cortex A-series), in particular the Cortex A9. 84 85Past support for older ARM 32-bit architectures in Barrelfish included: 86\begin{itemize} 87\item ARMv7m (Cortex M-series), in particular the Cortex M3. 88\item ARMv5 processors, in particular the Intel iXP2800 network 89 processor (which uses an XScale core). 90\item ARMv6 (ARM11MP) processors running under simulation in 91 \file{qemu}. 92\end{itemize} 93 94The main 32-bit ARM-based systems we target at present are: 95\begin{itemize} 96\item The Texas Instruments OMAP4460 SoC used in the Pandaboard ES 97 platform. 98\item The ARM VExpress\_EMM board, under emulation in the GEM5 99 simulator. 100\end{itemize} 101 102%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 103\chapter{Compilation} 104\label{sec:armcompile} 105 106Building Barrelfish with ARMv7 is straightforward; detailed 107requirements for packages are described in the latest README file. 108 109Compiling ARM support in Barrelfish requires a cross-compilation 110toolchain on the programmers \code{PATH}. For ARMv7 support we 111track the GNU toolchain shipped with Ubuntu LTS (14.04.3 at time of 112writing). 113 114Once you have the right tools, run hake with the correct options, 115e.g.: 116\begin{lstlisting} 117$ cd /build/barrelfish 118$ /git/barrelfish/hake/hake.sh -a armv7 -s /git/barrelfish 119... 120$ 121\end{lstlisting} 122 123After running \code{hake} with appropriate architecture support 124(i.e. use \code{-a armv7}), you can ask the Makefile what platforms it 125supports: 126 127\begin{lstlisting} 128$ make help-platforms 129------------------------------------------------------------------ 130Platforms supported by this Makefile. Use 'make <platform name>': 131 (these are the platforms available with your architecture choices) 132 133 Documentation: 134 Documentation for Barrelfish 135 PandaboardES: 136 Standard Pandaboard ES build image and modules 137 ARMv7-GEM5: 138 GEM5 emulator for ARM Cortex-A series multicore processors 139------------------------------------------------------------------ 140$ 141\end{lstlisting} 142 143Then build: 144 145\begin{lstlisting} 146$ make -j 8 PandaboardES 147\end{lstlisting} 148 149\section{Building for GEM5} 150 151To boot Barrelfish in GEM5, in addition to the previous steps you 152will need a supported version of GEM5. The GEM5 website 153(\url{gem5.org}) has comprehensive information. 154 155Unfortunately, different 156versions of GEM5 manifest different subtle bugs when emulating ARM 157systems. We recommend revision 0fea324c832c of GEM5 at present; 158please let us know if you find a more recent version that works well. 159 160To fetch and build GEM5 on Ubuntu LTS: 161 162\begin{lstlisting} 163$ sudo apt-get install scons swig python-dev libgoogle-perftools-dev m4 protobuf-compiler libprotobuf-dev 164$ hg clone http://repo.gem5.org/gem5 -r 0fea324c832c gem5 165adding changesets 166adding manifests 167adding file changes 168added 9356 changesets with 53499 changes to 6576 files 169updating to branch default 1703269 files updated, 0 files merged, 0 files removed, 0 files unresolved 171$ cd ./gem5 172$ scons build/ARM/gem5.fast 173... 174 175$ 176\end{lstlisting} 177 178GEM5 is a large system and may take some time to build. In addition, 179you may have to install minor fixes to ensure compilation (I had to 180add some initializers to \file{mem/ruby/network/orion/Wire.cc}, for 181example). 182 183After the compilation of GEM5 is finished, add the binary to your PATH. 184 185Now, build Barrelfish like this: 186\begin{lstlisting} 187$ make -j 8 ARMv7-GEM5 188\end{lstlisting} 189 190It's a good idea to set \code{armv7_platform} in 191\file{<build_dir>/hake/Config.hs} to \texttt{gem5} in order to enable 192the cache quirk workarounds for GEM5 and proper offsets for the 193platform simulated by GEM5. 194 195You can also build Barrelfish and boot inside GEM5 in a single step: 196 197\begin{lstlisting} 198$ make help-boot 199------------------------------------------------------------------ 200Boot instructions supported by this Makefile. Use 'make <boot name>': 201 (these are the targets available with your architecture choices) 202 203 gem5_armv7: 204 Boot an ARMv7a multicore image in GEM5 205 gem5_armv7_detailed: 206 Boot an ARMv7a multicore image in GEM5 using a detailed CPU model 207$ make gem5_armv7 208... 209\end{lstlisting} 210 211To get the output of Barrelfish you should: 212\begin{lstlisting} 213$ telnet localhost 3456 214\end{lstlisting} 215 216GEM5 is a highly configurable simulator. You can print the supported 217options of the GEM5 script as follows: 218 219\begin{lstlisting} 220$ gem5.fast gem5/gem5script.py -h 221\end{lstlisting} 222 223Note that if you boot using \code{make arm_gem5_detailed} rather than 224\code{make arm_gem5}, the simulation takes a long time (depending on 225your machine up to an hour just to boot Barrelfish). 226 227%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 228\chapter{Hardware assumptions and limitations} 229 230The current state of ARMv7 support in Barrelfish makes a number of 231assumptions about the underlying hardware platform, and also imposes 232some limitations. We discuss these here. 233 234\section{No support for Large Physical Address Extensions} 235 236The current Barrelfish design does not support LPAE for 32-bit ARM 237processors. Instead, it assumes a 32-bit physical address space. 238Supporting LPAE would require changes to the paging code, but would 239also require a mechanism to address user memory from the kernel 240effectively (see below). 241 242\section{Physical RAM starts at 2GB} 243 244Within the 32-bit physical address space, RAM is assumed to start at 245the 2GB boundary (i.e. \code{0x80000000}). This is the 246architectural recommendation for Cortex-A series processors, and we 247have yet to encounter non-LPAE ARMv7-A hardware which does not do 248this. Changing this assumption in the code should be possible, but in 249practice is likely to be dominated by the other limitations mentioned 250here. 251 252\section{Physical RAM is limited to 1GB} 253 254The Barrelfish ARMv7 CPU drivers can handle up to 1GB RAM, 255contiguously situated in the physical address space starting at 2GB. 256This limit could be raised by half a Gigabyte or so, at the cost of 257space for mapping kernel devices. In practice, the CPU does not need 258to map many kernel devices since most drivers run in user space on 259Barrelfish. Consequently, the allocation of the top 2GB of the 260virtual address space betwen 1-1 mapped RAM and kernel hardware 261devices could easily be moved. 262 263However, it remains that the total RAM visible to the CPU \emph{plus} 264the mappings for any devices needed by the CPU driver must fit into 265the top 2GB of the address space (mapped by the TTBR1 register). 266 267In particular, the CPU driver assumes that all physical RAM is mapped 2681-1, and relies on this when performing capability invocations. If 269the system had more RAM that could be mapped 1-1 into kernel virtual 270address space, we would need a method for the CPU driver to quickly 271access arbitrary physical addresses, entailing some kind of paging 272system. 273 274 275%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 276\chapter{Organization of the address space} 277 278Like many other popular operating systems, Barrelfish employs a memory 279split. The idea behind a memory split is to separate kernel code from 280user space code in the virtual address space. This allows the kernel 281to be mapped in every virtual address space of each user space 282program, which is necessary to allow user space code to access kernel 283features through the system call interface. If the kernel was not 284mapped into the virtual address space of each program, it would be 285impossible to jump to kernel code without switching the virtual 286address space. 287 288\begin{figure}[htb] 289 \centering 290 \includegraphics[width=8cm]{figures/virtual_addressing.pdf} 291 \caption{Barrelfish virtual address space layout for ARMv7-A} 292 \label{fig:memory_layout} 293\end{figure} 294 295Additionally ARMv7-A provides two translation table 296base registers, TTBR0 and TTBR1. We configure the system to use 297TTBR0 for address translations of virtual addresses below 2GB and 298TTBR1 for virtual address above 2GB. This saves us the explicit 299mapping of the kernel pages into every L1 page table of each process. 300Even though the kernel is mapped to each virtual address space, it is 301invisible for the user space program. Accessing memory, which belongs 302to the kernel, leads to a pagefault. Since many mappings can point to 303the same physical memory, memory usage is not increased by this 304technique. 305 306Figure~\ref{fig:memory_layout} shows the memory layout of the complete 307virtual address space of a single ARMv7-A core running Barrelfish. 308 309We have a memory split at 2GB, where everything upwards is only 310accessible in privileged modes and the lower 2GB of memory is 311accessible for user space programs. 312 313The kernel runs out of the kernel virtual address space where system 314RAM is mapped 1-1; in the region between \texttt{0x80000000} and 315\texttt{0xC0000000} RAM is mapped directly physical-to-virtual. 316 317The L1 page table of the kernel address space is located inside the 318data segment of the kernel right after the 319kernel and naturally aligned to 16KB. 320 321We map the whole available physical memory into the kernel���s virtual 322address space using ``sections'' (1MB large pages), obviating the need 323for a kernel L2 page table. 324 325Above \texttt{0xC0000000}, the CPU driver maps regions of physical 326memory corresponding to hardware devices it needs to directly access 327(typically the UARTs, interrupt controller, timers, Snoop Control 328Unit, and a few others). These are also mapped using sections. 329Virtual address regions are allocated in 1MB increments (the size of a 330section mapping) working down from the top section, which is used to 331map the area of RAM containing the CPU driver's exception vectors. 332 333Below the \texttt{0x80000000}, all mappings are handled by TTBR0 and 334changed on every context switch. At startup, the kernel uses another 335page table (also 16kB-aligned and located inside its data segment) to 336map low memory virtual-to-physical as well, as a way to access 337hardware devices in this region before the rest of the system has come 338up. However, after the early stages of bootstrap this table is no 339longer used. 340 341Instead, TTBR0 is always loaded with the address of a user domain's 342hardware page table and changes on a context switch. TTBR1 does not 343change, ensuring the kernel mappings are static after boot. 344 345%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 346\chapter{Boot sequence} 347 348\section{BSP (initial) core} 349 350\begin{enumerate} 351\item \file{boot.S:start} is called by the bootloader. It 352 sets the processor \mode{System}, sets up the (single) kernel 353 stack, the global object table pointer, and jumps to 354 \file{arch_init}. 355\item \file{init.c:arch_init} is called with a single 356 argument: the address of the multiboot info block. It first 357 initializes the serial console \file{serial_early_init} and 358 checks to see if this is the BSP. If so, it calls 359 \file{bsp_init}. 360\item \file{init.c:bsp_init} reads information from the multiboot 361 info into the global data structure, initializing it. It also 362 resets global spinlocks, and sizes RAM (though this information is 363 not yet used). It returns. 364\item \file{init.c:arch_init} continues by initialzing paging, 365 calling: 366\item \file{paging.c:paging_init} populates the two initial page 367 tables (one for each base register). The kernel (upper) page table 368 is initialized to map 1GB of RAM at 0x80000000, and the exception 369 vectors at the top of memory. The initial user (lower) page table 370 is set to map the lower 2GB of the physical address space 1-1 to 371 enable early device access. The MMU is then enabled. 372\item \file{init.c:arch_init} continues with the MMU enabled by 373 jumping at: 374\item \file{init.c:arch_init_2} which initializes exceptions, 375 relocating the current KCB, parses the command line arguments, and 376 re-initializes the serial ports so that the UART hardware is now 377 mapped correctly into kernel address space with a section mapping. 378 379 It then initializes the GIC, the Snoop Control Unit, the Global 380 Timer, and the Time Slice Counter. Cycle counter access from 381 \mode{User} is enabled, and the coreboot spawn handler set up. It 382 then calls: 383 384\item \file{startup_arch.c:arm_kernel_startup} which initializes 385 a simple memory allocator from the global structure, allocates the 386 a new KCB, and calls: 387 388\item \file{startup_arch.c:spawn_bsp_init} which creates the 389 initial kernel data structures for spawning the init process. It 390 also creates the initial capabilities for init to use to allocate 391 memory, and returns. 392 393\item \file{startup_arch.c:arm_kernel_startup} continues 394 but calling \code{dispatch} on the init DCB, and we are now up and 395 running. 396\end{enumerate} 397 398%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 399\chapter{Exception code paths} 400 401ARMv7-A exceptions are initialized in 402\file{exceptions.S:exceptions_init}, which for some reason is 403written in assembly. It assumes the core is running in \mode{System}. 404 405There is a 256-byte statically-allocated stack for each exception 406mode, and an 8kB stack used for subsequently calling into C in 407\mode{System}, all defined in \file{exceptions.S}. 408 409Most exception handlers in the vector table start by checking whether 410the processor was in \mode{User} or not when the trap happened. In most 411cases, if the processor was not in \mode{User}, the result is that 412\mode{System} is entered and the processor jumps to 413\file{exn.c:fatal_kernel_fault}, which panics. Exceptions to this 414rule are noted below. 415 416For exceptions taken while the processor is in \mode{User}, the 417address of the current (user space) dispatcher is loaded (macro \\ 418\code{get_dispatcher_shared_arm}), and a check is made to see if the 419dispatcher is ``enabled'' (in other words, whether the dispatcher 420should be upcalled when next dispatched). 421 422This latter check is performed by the macro \code{disp_is_disabled}, 423and returns non-zero if: 424\begin{enumerate} 425\item The \code{disabled} value in the dispatcher (at offset 426 \code{OFFSET_OF_DISP_DISABLED}) is non-zero, \emph{or} 427\item The PC lies between the two values in the dispatcher with 428 offsets \code{OFFSETOF_DISP_CRIT_PC_LOW} and 429 \code{OFFSETOF_DISP_CRIT_PC_HIGH}\footnote{A trick suggested by 430 Justin Cappos to allow an atomic resume of a user-level thread 431 without entering the kernel}. 432\end{enumerate} 433 434Depending on this, context is saved in a different area of the 435dispatcher, \mode{System} is entered, and a call is made to C code as 436noted below. 437 438Taking each exception in turn: 439 440\section{Reset exception} 441 442This is vector 0x00, and is not used. 443 444\section{Undefined Instruction exception} 445 446This is vector offset 0x04, and is referred to as 447\code{ARM_EVECTOR_UNDEF} in the source. 448The processor enters \code{undef_handler} in \mode{Undefined}. 449Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area. 450C is entered at \code{exn.c:handle_user_undef}. 451 452\section{Supervisor call (software interrupt)} 453 454This is vector offset 0x08, and referred to as 455\code{ARM_EVECTOR_SWI} in the source. 456The processor enters \code{swi_handler} in \mode{Supervisor}. 457 458If the syscall was issued from user space, context is saved in either 459the \code{ENABLED} or \code{DISABLED} area. C is entered at 460\code{syscall.c:sys_syscall}. 461 462If the syscall was issued from kernel space, no context is saved and 463C is entered at \code{syscall.c:sys_syscall_kernel}. 464 465\section{Prefetch Abort exception} 466 467This is vector offset 0x0C, and referred to as 468\code{ARM_EVECTOR_PABT} in the source. 469The processor enters \code{pabt_handler} in \mode{Abort}. 470 471Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area. 472C is entered at \code{exn.c:handle_user_page_fault}. 473 474\section{Data Abort exception} 475 476This is vector offset 0x10, and referred to as 477\code{ARM_EVECTOR_DABT} in the source. 478The processor enters \code{dabt_handler} in \mode{Abort}. 479 480Context is saved in either the \texttt{ENABLED} or \texttt{TRAP} area. 481C is entered at \code{exn.c:handle_user_page_fault} with the faulting 482address in \code{r0}. 483 484\section{Hyp Trap, or Hyp mode entry} 485 486This is vector offset 0x14, and is not used in Barrelfish. 487 488\section{IRQ interrupt} 489 490This is vector offset 0x18, and referred to as 491\code{ARM_EVECTOR_IRQ} in the source. 492The processor enters \code{irq_handler} in \mode{IRQ}. 493 494If the syscall was issued from user space, context is saved in either 495the \code{ENABLED} or \code{DISABLED} area. C is entered at 496\code{exn.c:handle_irq}. 497 498If the syscall was issued from kernel space, context is saved in 499\code{irq_save_area}, \mode{System} is entered, and C is called at 500\code{exn.c:handle_irq}. 501 502\section{Fast interrupt} 503 504This is vector offset 0x1C, and referred to as 505\code{ARM_EVECTOR_FIQ} in the source. 506The processor enters \code{fiq_handler} in \mode{FIQ}. 507 508If the syscall was issued from user space, context is saved in either 509the \code{ENABLED} or \code{DISABLED} area. C is entered at 510\code{exn.c:handle_irq} (as for IRQ). 511 512If the syscall was issued from kernel space, context is saved in 513\code{irq_save_area}, \mode{System} is entered, and C is called at 514\code{exn.c:handle_irq} (as for IRQ). 515 516%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 517\chapter{The Dispatch mechanism} 518 519Each time a CPU driver decides to switch to running a domain, it 520dispatches the domain in one of two ways: 521 522\begin{description} 523\item[RESUME], also known as ``disabled'': in this mode, the domain is 524 resumed exactly where it was preempted before, much as in operating 525 systems like Unix. 526\item[UPCALL], also known as ``enabled'': as with Scheduler 527 Activations, the domain is upcalled at a fixed address with a new 528 context on a small, dedicated stack. The context of the 529 previously-running thread in teh domain is available to be resumed 530 in user space, if the user-level scheduler (also known as the 531 activation handler) decides to. 532\end{description} 533 534Which one of these happens depends on the state of the domain. 535 536When a domain is running in user space (i.e. the kernel is \emph{not} 537executing) the domain is in one of two states, indicated by a 538combination of: 539\begin{itemize} 540\item the \code{disabled} field of the \code{struct 541 dispatcher_shared_generic} structure, 542\item the current program counter, 543\item the \code{crit_pc_low} and \code{crit_pc_high} fields of the \code{struct 544 dispatcher_shared_generic} structure. 545\end{itemize} 546 547Note that all of these values can be written by the user program. 548 549Specifically, the domain is in \code{RESUME} state \emph{iff}: 550\begin{enumerate} 551\item \code{disabled} is \code{true}, \emph{or} 552\item the current program counter lies between \code{crit_pc_low} and 553 \code{crit_pc_high} 554\end{enumerate} 555 556Otherwise, it is in state \code{UPCALL}. 557 558Once the kernel is entered, the \code{disabled} flag of the domain's 559\code{struct dcb} structure (as opposed to the \code{struct 560 dispatcher_shared_generic}) is updated to reflect the state of the 561preempted domain. 562 563 564%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 565\chapter{Key data structures} 566 567\begin{itemize} 568\item \code{struct dcb}: in \code{kernel/include/dispatch.h}; 569 the main domain control block. 570 571 \code{dcb_current} is a global pointer in the CPU driver that points 572 to the current DCB. 573 574 If \code{dp} is of type \core{struct dcb *}, then 575 \code{dp->disabled} is a flag which is 1 if the current DCB has 576 activations disabled (i.e. it should be resumed when next scheduled 577 to run) and 0 otherwise (in which case it should be upcalled) - the 578 analogy is with enabling and disabling interrupts. The flag is set 579 on entry to the kernel. 580 581\item \code{struct dispatcher_shared_generic}: in 582 \code{include/barrelfish_kpi/dispatcher_shared.h}: the 583 architecture-independent part of the a dispatcher, the user-space 584 datastructure corresponding to a DCB. This is the first struct in 585 architecture-dependent variants, such as \code{struct 586 dispatcher_shared_arm}. 587 588 If \code{dp} is of type \core{struct dispatcher_shared_generic *}, then 589 \code{dp->disabled} is a flag which is 1 if the current DCB has 590 591\end{itemize} 592 593%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 594\chapter{Hardware abstraction layers} 595 596Barrelfish distinguishes between: 597\begin{itemize} 598\item General code 599\item Architecture-specific code (e.g. ARMv7-A code) 600\item Platform-specific code (e.g. code for the OMAP4460 SoC) 601\end{itemize} 602 603Since most Barrelfish device drivers run in userspace, the difference 604between ``platform'' as a chip (such as the OMAP4460) and ``platform'' 605as a board or complete machine (such as the PandaBoard ES) are 606relatively unimportant inside the CPU driver, since most of the 607platform-specific CPU driver code is actually specific to a chip or 608SoC. 609 610Barrelfish CPU driver source code for ARMv7-A systems therefore 611consists of the following categories: 612\begin{itemize} 613\item Portable, architecture-independent code. 614\item ARMv7-A-specific code which common to all ARMv7-A platforms 615\item Code for particular devices or macrocells which are only used on 616 ARMv7-A, but might appear on multiple ARMv7-A platforms. 617\item Platform-specific code. 618\end{itemize} 619 620We restrict platform-specific code to a single source file, which 621roughly corresponds to ARM's concept of an ``integrator'', and acts as 622a compilation-time indirection layer between commmon ARMv7-A-specific 623code and individual device and macrocell drivers. 624 625\section{The ARMv7-A HAL} 626 627Platform code for a Barrelfish ARMv7-A CPU driver must implement the 628following interfaces: 629 630\begin{description} 631\item[serial.h]: Low-level drivers for a multiple UART devices. 632\item[spinlock.h]: Some number of static spinlocks, used for 633 coordinating access to e.g. serial devices between CPU drivers on 634 different cores. 635\end{description} 636 637 638 639%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 640\chapter{Code organization} 641 642The variety of ARM platforms make organizing source trees to maximise 643code reuse across different platforms a challenge. 644 645Barrelfish distinguishes between \emph{Architectures}, which are 646typically processor architectures like ``ARMv7-A'', and \emph{Platforms}, 647which are complete system targets, like ``PandaBoard-ES''. 648 649Code and headers specific to a particular architecture are found in 650the source tree is various subdirectories of the form 651\file{../arch/armv7/}. 652 653%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 654\chapter{Versatile Express platform} 655 656%-------------------------------------------------- 657\chapter{GEM5 specifics} 658 659The GEM5~\cite{gem5:sigarch11} simulator combines the best aspects of 660the M5~\cite{m5:micro06} and GEMS~\cite{gems:sigarch05} 661simulators. With its flexible and highly modular design, GEM5 allows 662the simulation of a wide range of systems. GEM5 supports a wide range 663of ISAs like x86, SPARC, Alpha and, in our case most importantly, 664ARM. In the following we will list some features of GEM5. 665 666GEM5 supports four different CPU models: AtomicSimple, TimingSimple, 667In-Order and O3. The first two are simple one-cycle-per-instruction 668CPU models. The difference between the two lies in the way they handle 669memory accesses. The AtomicSimple model completes all memory accesses 670immediately, whereas the TimingSimple CPU models the timing of memory 671accesses. Due to their simplicity, the simulation speed is far above 672the other two models. The InOrder CPU models an in-order pipeline and 673focuses on timing and simulation accuracy. The pipeline can be 674configured to model different numbers of stages and hardware threads. 675The O3 CPU models a pipelined, out-of-order and possibly superscalar 676CPU model. It simulates dependencies between instructions, memory 677accesses, pipeline stages and functional units. With a load/store 678queue and reorder buffer its possible to simulate superscalar 679architectures as well as multiple hardware threads. 680 681The GEM5 simulator provides a tight integration of Python into the 682simulator. Python is mainly used for system configuration. Every 683simulated building block of a system is implemented in C++ but are 684also reflected as a Python class and derive from a single superclass 685SimObject. This provides a very flexible way of system construction 686and allows to tailor nearly every aspect of the system to our needs. 687Python is also used to control the simulation, taking and restoring 688snapshots as well as all the command line processing. 689 690We use a VExpress\_EMM based system to run Barrelfish. The number of 691cores can be passed as an argument to the GEM5 script. Cores are 692clocked at 1 GHz and main memory is 64 MB starting at 2 GB. 693 694\section{Boot process: first (bootstrap) core} 695 696% Source: Samuel's thesis, 4.1.1 697 698This section gives a high-level overview of the boot up process of the 699Barrelfish 700kernel on ARMv7-a. In subsequent sections we will go more into details 701involved 702in the single steps. 703\begin{enumerate} 704\item Setup kernel stack and ensure privileged mode 705\item Allocate L1 page table for kernel 706\item Create necessary mappings for address translation 707\item Set translation table base register (TTBR) and domain 708 permissions 709\item Activate MMU, relocate program counter and stack pointer 710\item Invalidate TLB, setup arguments for first C-function arch init 711\item Setup exception handling 712\item Map the available physical memory in the kernel L1 page table 713\item Parse command line and set corresponding variables 714\item Initialize devices 715\item Create a physical memory map for the available memory 716\item Check ramdisk for errors 717\item Initialize and switch to init���s address space 718\item Load init image from ramdisk into memory 719\item Load and create capabilities for modules defined by menu.lst 720\item Start timer for scheduling 721\item Schedule init and switch to user space 722\item init brings up the monitor and mem serv 723\item monitor spawns ramfsd, skb and all the other modules 724\end{enumerate} 725 726\section{Boot process: subsequent cores} 727 728% Source: Samuel, 4.2.2 729 730The boot up protocol for the multi-core port differs in various ways 731from the boot up procedure of our previous single-core port. We 732therefore include this revised overview here. The first core is called 733the bootstrap processor and every subsequent core is called an 734application processor On bootstrap processor: 735 736\begin{enumerate} 737\item Pass argument from bootloader to first C-function arch 738 init 18 739\item Make multiboot information passed by bootloader globally 740 available 741\item Create 1:1 mapping of address space and alias the same region at 742 high memory 743\item Configure and activate MMU 744\item Relocate kernel image to high memory 745\item Reset mapping, only map in the physical memory aliased at high 746 memory 747\item Parse command line and set corresponding variables 748\item Initialize devices 749\item Initialize and switch to init���s address space 750\item Load init image into memory 751\item Create capabilities for modules defined by the multiboot info 752\item Schedule init and switch to user space 753\item init brings up the monitor and mem serv 754\item monitor spawns ramfsd, skb and all the other modules 755\item spawnd parses its cmd line and tells the monitor to bring up a 756 new core 757\item monitor setups inter-monitor communication channel 758\item monitor allocates memory for new kernel and remote monitor 759\item monitor loads kernel image and relocates it to destination 760 address 761\item monitor setups boot information for new kernel 762\item spawnd issues syscall to start new core 763\item Kernel writes entry address for new core into SYSFLAG registers 764\item Kernel raises software interrupt to start new core 765\item Kernel spins on pseudo-lock until other kernel releases it 766\item repeat steps 15 to 23 for each application processor 767\end{enumerate} 768 769 770%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 771\chapter{OMAP44xx platform} 772 773% Source: Claudio 3.1 774 775The OMAP4460 is a system on a chip (SoC) by Texas Instruments, 776intended for use in consumer devices like smartphones and tablet 777computers. It contains: 778 779\begin{itemize} 780\item A dual core ARM Cortex-A9 processor 781\item Two ARM Cortex-M3 processors 782\item A hardware spinlock module 783\item A mailbox module 784\item Many devices to process media input and output 785\end{itemize} 786 787The intention is that the Cortex-A9 will be running a general purpose 788operating system, while the Cortex-M3 processors will only be running 789a real-time operating system to control the imaging subsystem. 790 791The processor configuration in the OMAP4460 is somewhat 792unconventional; for example, the Cortex-M3 processors share a 793custom MMU with page faults handled by code running on the Cortex-A9 794processors and hence are constrained to run in the same virtual 795address at all times. They are also not cache-coherent with the 796Cortex-A9 cores. 797 798\section{Compiling and booting} 799 800To compile Barrelfish for the Pandaboard, first configure your 801toolchain as described in Section~\ref{sec:armcompile}. Then execute: 802 803\begin{lstlisting} 804cd @\shell@SRC 805mkdir build 806cd build 807../hake/hake.sh -a armv7 -s ../ 808make pandaboard_image 809\end{lstlisting} 810 811The resulting image can be booted on the Pandaboard over the USB OTG 812connector using the standard \texttt{usbboot} utility. It will 813generate console output on the Pandaboard's serial connector. 814 815\section{Booting the second OMAP A9 core} 816 817% source: AOS m6 818 819Here is a brief overview of how the bootstrapping process for the second core 820works: it waits for a signal from the BSP core (an interrupt), and when this 821signal is received, the application core will read an address from a well- 822defined register and start executing the code from this address. 823 824To boot the second core, one can write the address of 825a function to the register and send the inter-processor 826interrupt. Following are some pointers to the documentation to help 827understand the bootstrapping process in more detail: 828 829\begin{itemize} 830\item Section 27.4.4 in the OMAP44xx manual talks about the boot process for 831 application cores. 832\item Pages 1144 \textit{ff.} in the OMAP44xx manual have the register 833 layout for the registers that are used in the boot process of the 834 second core. 835\end{itemize} 836 837Note that the Barrelfish codebase distinguishes between the BSP (bootstrap) 838processor and APP (application) processors. This distinction and naming 839originates from Intel x86 support where the BIOS will choose a 840distinguished BSP processor at start-up and the OS 841is responsible for starting the rest of the processors (the APP 842processors). Although it works somewhat differently on 843ARM, the naming convention is applicable here as well. 844 845Note also that the second core will start working with the MMU 846disabled, so is running in physical address space. The bootstrapping 847code sets up a stack, initial page tables and an initial Barrelfish 848dispatcher. 849 850\section{Physical address space} 851 852At present, a temporary limitation in the core boot protocol means 853that running Barrelfish on both A9 cores requires static partitioning of 854the available RAM into two halves, with an independent memory server 855running on each core. This is will fixed in a subsequent release. 856 857\section{Interconnect driver}\label{sec:interconnect} 858 859Communication between A9 cores on the OMAP processor is performed 860using a variant of the CC-UMP interconnect driver, modified for the 86132-byte cache line size of the ARMv7 architecture. A notification 862driver for inter-processor interrupts exists. 863 864The OMAP4460 also has mailbox hardware which can be used by both the 865A9 and M3 cores. Barrelfish support for this hardware is in 866progress. 867 868\section{M3 cores} 869 870Barrelfish also has rudimentary support for running on both the A9 and 871M3 cores. This is limited by the requirement that the M3 cores must 872run in the same virtual address space, and do not have a way to 873automatically change address space on a kernel trap. For this reason, 874we only execute on a single M3 core at present. 875 876Before the Cortex-M3 can start executing code, the following steps 877have to be taken by the Cortex-A9: 878 879\begin{enumerate} 880\item Power on the Cortex-M3 subsystem 881\item Activate the Cortex-M3 subsystem clock 882\item Load the image to be executed into memory 883\item Enable the L2 MMU 884\item Set up mappings for the loaded image in the L2 MMU (can be 885 written directly into the TLB) 886\item Write the first two entries of the vectortable (initial sp and 887 reset vector) 888\item Take the Cortex-M3 out of reset 889\end{enumerate} 890 891It is important to note that the Cortex-M3 is in a virtual address 892space from the very beginning, reading the vector table at virtual 893address 0. Inserting a 1:1 mapping for the kernel image greatly 894simplifies the bootstrapping of memory management on the Cortex-M3 895once it is running, because it needs to know the physical address of 896the page tables it sets up. 897 898 899%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 900\bibliographystyle{abbrv} 901\bibliography{defs,barrelfish} 902 903\end{document} 904 905\end{document} 906