1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2% Copyright (c) 2015, ETH Zurich.
3% All rights reserved.
4%
5% This file is distributed under the terms in the attached LICENSE file.
6% If you do not find this file, copies can be found by writing to:
7% ETH Zurich D-INFK, Universitaetstr 6, CH-8092 Zurich. Attn: Systems Group.
8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
9
10\documentclass[a4paper,11pt,twoside]{report}
11\usepackage{bftn}
12\usepackage{calc}
13\usepackage{verbatim}
14\usepackage{xspace}
15\usepackage{pifont}
16\usepackage{pxfonts}
17\usepackage{textcomp}
18\usepackage{amsmath}
19\usepackage{multirow}
20\usepackage{listings}
21\usepackage[framemethod=default]{mdframed}
22\usepackage[shortlabels]{enumitem}
23\usepackage{parskip}
24\usepackage{xparse}
25
26\newcommand{\todo}[1]{[\textcolor{red}{\emph{#1}}]}
27
28\title{CPU drivers in Barrelfish}
29\author{Barrelfish project}
30\tnnumber{21}
31\tnkey{CPU drivers}
32
33\begin{document}
34\maketitle			% Uncomment for final draft
35
36\begin{versionhistory}
37\vhEntry{0.1}{01.12.2015}{GZ}{Initial Version}
38\end{versionhistory}
39
40% \intro{Abstract}		% Insert abstract here
41% \intro{Acknowledgements}	% Uncomment (if needed) for acknowledgements
42\tableofcontents		% Uncomment (if needed) for final draft
43% \listoffigures		% Uncomment (if needed) for final draft
44% \listoftables			% Uncomment (if needed) for final draft
45\cleardoublepage
46\setcounter{secnumdepth}{2}
47
48\newcommand{\fnname}[1]{\textit{\texttt{#1}}}%
49\newcommand{\datatype}[1]{\textit{\texttt{#1}}}%
50\newcommand{\varname}[1]{\texttt{#1}}%
51\newcommand{\keywname}[1]{\textbf{\texttt{#1}}}%
52\newcommand{\pathname}[1]{\texttt{#1}}%
53\newcommand{\tabindent}{\hspace*{3ex}}%
54\newcommand{\sockeye}{\lstinline[language=sockeye]}
55\newcommand{\ccode}{\lstinline[language=C]}
56
57\lstset{
58  language=C,
59  basicstyle=\ttfamily \small,
60  keywordstyle=\bfseries,
61  flexiblecolumns=false,
62  basewidth={0.5em,0.45em},
63  boxpos=t,
64  captionpos=b
65}
66
67\chapter{Introduction}
68\label{chap:introduction}
69
70This document describes the CPU driver, the part of Barrelfish that typically
71runs in privileged mode (also known as kernel) on our supported architectures.
72
73Barrelfish currently supports the following CPU drivers for
74different CPU architectures and platforms:
75\begin{itemize}
76    \item x86-32
77    \item x86-64
78    \item k1om
79    \item ARMv7
80    \item ...
81    \item ARMv8
82\end{itemize}
83
84\section{General design decisions}
85\begin{itemize}
86\item No dynamic memory allocation
87\item No preemption
88\item ...
89\end{itemize}
90
91\section{Code file structure and layout}
92TODO: Should explain things such as naming, where goes architecture dependent, platform specific code?
93What libraries we use in the kernel? Where is the shared code between libbarrelfish and a cpudriver?
94
95\chapter{x86-64}
96
97The x86-64 implementation of Barrelfish is specific to the AMD64
98and Intel 64 architectures. This text will refer to features of
99those architectures. Those and further features can be found in
100\cite{intelsa} and \cite{amdsa} for the Intel 64 and AMD64
101architectures, respectively.
102
103\section{Boot process}
104
105We first describe the boot process for the initial BSP core, followed by
106the boot process of an APP core.
107
108\subsection{BSP Core}
109
110Barrelfish relies on a multiboot v1~\cite{multiboot1} compliant boot-loader to
111load the initial kernel on the BSP core. In our current set-up we use GRUB as
112our boot-loader which contains an implementation of the multiboot standard.
113
114On start-up, GRUB will search the supplied kernel module (on x86-64 this is the
115binary called elver in \pathname{tools/elver/}) for a magic byte sequence
116(defined by multiboot) and begin execution just after that sequence appeared
117(see \pathname{tools/elver/boot.S}).
118
119\pathname{boot.S} in elver will set-up an preliminary GDT, an IA32-e page-table,
120and stack for execution. \pathname{elver.c} will then search for a
121binary called \keywname{kernel} or \keywname{cpu} in all the multiboot
122modules, relocate that module and then jump to the relocated kernel module. At
123this point, we have set-up a 1 GiB identity mapping of the physical address
124space using 2 MiB  pages in order to address everything we need initially.
125
126Note that the reason elver exists is because multiboot v1 does not support
127ELF64 images (or setting up long-mode). If we use a bootloader that supports
128loading relocatable ELF64 images into 64-bit mode, elver would be redundant.
129
130After \keywname{elver} is done, execution in the proper BSP kernel program
131begins in \pathname{kernel/arch/x86\_64/boot.S} which then calls
132\fnname{arch\_init}, the first kernel C entry point.
133
134\subsection{APP Core}
135
136APP cores are booted using the coreboot infrastructure in Barrelfish. The
137logic that boots APP cores resides in \pathname{usr/drivers/cpuboot}.
138
139The source code responsible for booting a new core on x86 is found in
140\pathname{usr/drivers/cpuboot/x86boot.c}, specifically in the function called
141\fnname{spawn\_xcore\_monitor}. \fnname{spawn\_xcore\_monitor} will load the
142\keywname{kernel} and \keywname{monitor} binary, and relocate the kernel. The
143function called \fnname{start\_aps\_x86\_64\_start} will afterwards map in the
144bootstrap code (which is defined in \pathname{init\_ap\_x86\_64.S}) for booting the
145APP core. One complication for this code is that it has to resides below 1 MiB
146in physical memory since the new APP core starts in protected mode and
147therefore can not address anything above that limit in the beginning. Once the
148mapping is initiated, the entry point address for the new APP kernel will be
149written into this memory region. Finally, a set of system calls are invoked
150in order to send the necessary IPIs to bootstrap the new processor.
151
152\section{Virtual Address Space}
153
154The page table is constructed by copying VNode capabilities into VNodes to
155link intermediate page tables, and minting Frame / DeviceFrame capabilities
156into leaf VNodes to perform mappings.
157
158When minting a frame capability to a VNode, the frame must be at least as
159large as the smallest page size. The type-specific parameters are:
160
161\begin{enumerate}
162  \item \textbf{Access flags:}
163    The permissible set of flags is PTABLE\_GLOBAL\_PAGE
164    | PTABLE\_ATTR\_INDEX | PTABLE\_CACHE\_DISABLED |
165    PTABLE\_WRITE\_THROUGH. Access flags are set from frame capability
166    access flags. All other flags are not settable from user-space (like
167    PRESENT and SUPERVISOR).
168
169  \item \textbf{Number of base-page-sized pages to map:} If non-zero, this
170    parameter allows the caller to prevent the entire frame capability from
171    being mapped, by specifying the number of base-page-sized pages
172    of the region (starting from offset zero) to map.
173\end{enumerate}
174
175\todo{address space layout after initialization is done}
176
177\section{IO capabilities}
178
179IO capabilities provide kernel-mediated access to the legacy IO space of
180the processor. Each IO capability allows access only to a specific range of
181ports.
182
183The Mint invocation (see \autoref{sec:mint}) allows the permissible
184port range to be reduced (with the lower limit in the first
185type-specific parameter, and the upper limit in the second type-specific
186parameter).
187
188At boot, an IO capability for the entire port space is passed to the
189initial user domain. Aside from being copied or minted, IO capabilities may not
190be created.
191
192\section{Global Descriptor Table (GDT)}
193
194The GDT table is loaded by the \fnname{gdt\_reset} function during start-up and statically defined.
195
196The table contains the following entries:
197
198\begin{tabular}{c|l}
199    Index & Description \\ \hline
200    0 & NULL segment  \\
201    1 & Kernel code segment  \\
202    2 & Kernel stack segment  \\
203    3 & User stack segment  \\
204    4 & User code segment  \\
205    5 & Task state segment  \\
206    6 & Task state segment (cont.)  \\
207    7 & Local descriptor table \\
208    8 & Local descriptor table (cont.) \\
209\end{tabular}
210
211\section{Interrupts and Exceptions}
212
213The initial (Interrupt Descriptor Table) IDT is set-up by
214\fnname{setup\_default\_idt} in \pathname{irq.c}. The number of entries in the
215IDT is set to  256 entries which are initialized in the following way:
216
217\begin{tabular}{c|l}
218    Index & Description \\ \hline
219    0  &  Divide Error \\
220    1  &  Debug \\
221    2  &  Nonmaskable External Interrupt \\
222    3  &  Breakpoint \\
223    4  &  Overflow \\
224    5  &  Bound Range Exceeded \\
225    6  &  Undefined/Invalid Opcode \\
226    7  &  No Math Coprocessor \\
227    8  &  Double Fault \\
228    9  &  Coprocessor Segment Overrun \\
229    10 &  Invalid TSS \\
230    11 &  Segment Not Present \\
231    12 &  Stack Segment Fault \\
232    13 &  General Protection Fault \\
233    14 &  Page Fault \\
234    15 &  Unused \\
235    16 &  FPU Floating-Point Error \\
236    17 &  Alignment Check \\
237    18 &  Machine Check \\
238    19 &  SIMD Floating-Point Exception \\
239    \hline
240    32 & \multirow{3}{*}{PIC Interrupts} \\
241    \vdots{} & \\
242    47 & \\
243    \hline
244    48 & \multirow{3}{*}{Generic Interrupts} \\
245    \vdots{} & \\
246    61 & \\
247    \hline
248    62 & Tracing IPI \\
249    63 & Tracing IPI \\
250    \hline
251    64 & \multirow{3}{*}{Unused} \\
252    \vdots{} & \\
253    247 & \\
254    \hline
255    248 & Halt IPI (Stopping a core) \\
256    249 & Inter core vector (IPI notifications) \\
257    250 & APIC Timer \\
258    251 & APIC Thermal \\
259    252 & APIC Performance monitoring interrupt \\
260    253 & APIC Error \\
261    254 & APIC Spurious interrupt \\
262    255 & Unused \\
263\end{tabular}
264
265The lower 32 interrupts are reserved as CPU exceptions. Except for a
266double fault exception, which is always handled by the kernel
267directly, an exception is forwarded to the dispatcher handling the
268domain on the CPU on which it appeared.
269
270Page faults (interrupt 14) are dispatched to the `pagefault` entry
271point of the dispatcher. All other exceptions are dispatched to the
272`trap` entry point of the dispatcher.
273
274There are 224 hardware interrupts, ranging from IRQ number 32 to 255.
275The kernel delivers an interrupt that is not an exception and not
276the local APIC timer interrupt to user-space. The local APIC timer
277interrupt is used by the kernel for preemptive scheduling and not
278delivered to user-space.
279
280Unused entries will be initialized by a special handler function. The slots
281reserved for generic interrupts can be allocated by user-space applications.
282
283\section{Local Descriptor Table (LDT)}
284
285The local descriptor table segment in the GDT will
286initially point to NULL as no LDT is installed.
287
288User-space applications can install their own LDT table
289which is loaded on context-switching using the
290\fnname{maybe\_reload\_ldt} function.
291
292
293\section{Registers}
294
295\paragraph{Segment registers}
296
297Segment registers are initialized by the \fnname{gdt\_reset} function during start-up and each of them points to a GDT entry (index of the GDT table slot for each segment is given in brackets).
298
299\begin{itemize}
300\item[cs] Kernel code segment (1)
301\item[ds] NULL segment (0)
302\item[es] NULL segment (0)
303\item[fs] NULL segment (0)
304\item[gs] NULL segment (0)
305\item[ss] Kernel stack segment (2)
306\end{itemize}
307
308We also note that the \keywname{fs} and \keywname{gs} segment registers are
309preserved and restored across context switches.
310
311\paragraph{General purpose registers}
312\begin{itemize}
313\item \keywname{rcx} contains the start address when running a dispatcher
314for the first time.
315\end{itemize}
316
317\todo{Floating point / SIMD}
318\todo{Machine specific registers (MSR)}
319
320\section{Hardware devices}
321
322\subsection{Serial port}
323On x86, the serial device (a PC16550 compatible controller) is initialized for the first time by the BSP core on boot-up.
324
325By default serial port \varname{0x3f8} will be used, but the port can be changed by
326using a command line argument supplied to the kernel.
327
328Notable settings for the serial driver are:
329\begin{itemize}
330    \item Interrupts are disabled.
331    \item FIFOs are enabled.
332    \item No stop bit.
333    \item 8 data bits.
334    \item No parity bit.
335    \item BAUD rate is 115200.
336\end{itemize}
337
338The serial device is later re-initialized into a different state once the
339serial driver takes over the device. For example, interrupts will then be
340enabled and handled by the driver.
341
342\subsection{PIC -- Programmable Interrupt Controller}
343\todo{describe}
344
345\subsection{xAPIC -- Advanced Programmable Interrupt Controller}
346\todo{describe}
347
348\subsection{System call API}
349This section describe the architectural system calls that are not
350common with other architectures.
351
352\begin{itemize}
353\item[7] SYSCALL\_X86\_FPU\_TRAP\_ON: Turn FPU trap on (x86)
354\item[8] SYSCALL\_X86\_RELOAD\_LDT: Reload the LDT register (x86\_64)
355\end{itemize}
356
357
358%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
359\bibliographystyle{abbrv}
360\bibliography{barrelfish}
361
362\end{document}
363