1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2% Copyright (c) 2015, ETH Zurich. 3% All rights reserved. 4% 5% This file is distributed under the terms in the attached LICENSE file. 6% If you do not find this file, copies can be found by writing to: 7% ETH Zurich D-INFK, Universitaetstr 6, CH-8092 Zurich. Attn: Systems Group. 8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 9 10\documentclass[a4paper,11pt,twoside]{report} 11\usepackage{bftn} 12\usepackage{calc} 13\usepackage{verbatim} 14\usepackage{xspace} 15\usepackage{pifont} 16\usepackage{pxfonts} 17\usepackage{textcomp} 18\usepackage{amsmath} 19\usepackage{multirow} 20\usepackage{listings} 21\usepackage[framemethod=default]{mdframed} 22\usepackage[shortlabels]{enumitem} 23\usepackage{parskip} 24\usepackage{xparse} 25 26\newcommand{\todo}[1]{[\textcolor{red}{\emph{#1}}]} 27 28\title{CPU drivers in Barrelfish} 29\author{Barrelfish project} 30\tnnumber{21} 31\tnkey{CPU drivers} 32 33\begin{document} 34\maketitle % Uncomment for final draft 35 36\begin{versionhistory} 37\vhEntry{0.1}{01.12.2015}{GZ}{Initial Version} 38\end{versionhistory} 39 40% \intro{Abstract} % Insert abstract here 41% \intro{Acknowledgements} % Uncomment (if needed) for acknowledgements 42\tableofcontents % Uncomment (if needed) for final draft 43% \listoffigures % Uncomment (if needed) for final draft 44% \listoftables % Uncomment (if needed) for final draft 45\cleardoublepage 46\setcounter{secnumdepth}{2} 47 48\newcommand{\fnname}[1]{\textit{\texttt{#1}}}% 49\newcommand{\datatype}[1]{\textit{\texttt{#1}}}% 50\newcommand{\varname}[1]{\texttt{#1}}% 51\newcommand{\keywname}[1]{\textbf{\texttt{#1}}}% 52\newcommand{\pathname}[1]{\texttt{#1}}% 53\newcommand{\tabindent}{\hspace*{3ex}}% 54\newcommand{\sockeye}{\lstinline[language=sockeye]} 55\newcommand{\ccode}{\lstinline[language=C]} 56 57\lstset{ 58 language=C, 59 basicstyle=\ttfamily \small, 60 keywordstyle=\bfseries, 61 flexiblecolumns=false, 62 basewidth={0.5em,0.45em}, 63 boxpos=t, 64 captionpos=b 65} 66 67\chapter{Introduction} 68\label{chap:introduction} 69 70This document describes the CPU driver, the part of Barrelfish that typically 71runs in privileged mode (also known as kernel) on our supported architectures. 72 73Barrelfish currently supports the following CPU drivers for 74different CPU architectures and platforms: 75\begin{itemize} 76 \item x86-32 77 \item x86-64 78 \item k1om 79 \item ARMv7 80 \item ... 81 \item ARMv8 82\end{itemize} 83 84\section{General design decisions} 85\begin{itemize} 86\item No dynamic memory allocation 87\item No preemption 88\item ... 89\end{itemize} 90 91\section{Code file structure and layout} 92TODO: Should explain things such as naming, where goes architecture dependent, platform specific code? 93What libraries we use in the kernel? Where is the shared code between libbarrelfish and a cpudriver? 94 95\chapter{x86-64} 96 97The x86-64 implementation of Barrelfish is specific to the AMD64 98and Intel 64 architectures. This text will refer to features of 99those architectures. Those and further features can be found in 100\cite{intelsa} and \cite{amdsa} for the Intel 64 and AMD64 101architectures, respectively. 102 103\section{Boot process} 104 105We first describe the boot process for the initial BSP core, followed by 106the boot process of an APP core. 107 108\subsection{BSP Core} 109 110Barrelfish relies on a multiboot v1~\cite{multiboot1} compliant boot-loader to 111load the initial kernel on the BSP core. In our current set-up we use GRUB as 112our boot-loader which contains an implementation of the multiboot standard. 113 114On start-up, GRUB will search the supplied kernel module (on x86-64 this is the 115binary called elver in \pathname{tools/elver/}) for a magic byte sequence 116(defined by multiboot) and begin execution just after that sequence appeared 117(see \pathname{tools/elver/boot.S}). 118 119\pathname{boot.S} in elver will set-up an preliminary GDT, an IA32-e page-table, 120and stack for execution. \pathname{elver.c} will then search for a 121binary called \keywname{kernel} or \keywname{cpu} in all the multiboot 122modules, relocate that module and then jump to the relocated kernel module. At 123this point, we have set-up a 1 GiB identity mapping of the physical address 124space using 2 MiB pages in order to address everything we need initially. 125 126Note that the reason elver exists is because multiboot v1 does not support 127ELF64 images (or setting up long-mode). If we use a bootloader that supports 128loading relocatable ELF64 images into 64-bit mode, elver would be redundant. 129 130After \keywname{elver} is done, execution in the proper BSP kernel program 131begins in \pathname{kernel/arch/x86\_64/boot.S} which then calls 132\fnname{arch\_init}, the first kernel C entry point. 133 134\subsection{APP Core} 135 136APP cores are booted using the coreboot infrastructure in Barrelfish. The 137logic that boots APP cores resides in \pathname{usr/drivers/cpuboot}. 138 139The source code responsible for booting a new core on x86 is found in 140\pathname{usr/drivers/cpuboot/x86boot.c}, specifically in the function called 141\fnname{spawn\_xcore\_monitor}. \fnname{spawn\_xcore\_monitor} will load the 142\keywname{kernel} and \keywname{monitor} binary, and relocate the kernel. The 143function called \fnname{start\_aps\_x86\_64\_start} will afterwards map in the 144bootstrap code (which is defined in \pathname{init\_ap\_x86\_64.S}) for booting the 145APP core. One complication for this code is that it has to resides below 1 MiB 146in physical memory since the new APP core starts in protected mode and 147therefore can not address anything above that limit in the beginning. Once the 148mapping is initiated, the entry point address for the new APP kernel will be 149written into this memory region. Finally, a set of system calls are invoked 150in order to send the necessary IPIs to bootstrap the new processor. 151 152\section{Virtual Address Space} 153 154The page table is constructed by copying VNode capabilities into VNodes to 155link intermediate page tables, and minting Frame / DeviceFrame capabilities 156into leaf VNodes to perform mappings. 157 158When minting a frame capability to a VNode, the frame must be at least as 159large as the smallest page size. The type-specific parameters are: 160 161\begin{enumerate} 162 \item \textbf{Access flags:} 163 The permissible set of flags is PTABLE\_GLOBAL\_PAGE 164 | PTABLE\_ATTR\_INDEX | PTABLE\_CACHE\_DISABLED | 165 PTABLE\_WRITE\_THROUGH. Access flags are set from frame capability 166 access flags. All other flags are not settable from user-space (like 167 PRESENT and SUPERVISOR). 168 169 \item \textbf{Number of base-page-sized pages to map:} If non-zero, this 170 parameter allows the caller to prevent the entire frame capability from 171 being mapped, by specifying the number of base-page-sized pages 172 of the region (starting from offset zero) to map. 173\end{enumerate} 174 175\todo{address space layout after initialization is done} 176 177\section{IO capabilities} 178 179IO capabilities provide kernel-mediated access to the legacy IO space of 180the processor. Each IO capability allows access only to a specific range of 181ports. 182 183The Mint invocation (see \autoref{sec:mint}) allows the permissible 184port range to be reduced (with the lower limit in the first 185type-specific parameter, and the upper limit in the second type-specific 186parameter). 187 188At boot, an IO capability for the entire port space is passed to the 189initial user domain. Aside from being copied or minted, IO capabilities may not 190be created. 191 192\section{Global Descriptor Table (GDT)} 193 194The GDT table is loaded by the \fnname{gdt\_reset} function during start-up and statically defined. 195 196The table contains the following entries: 197 198\begin{tabular}{c|l} 199 Index & Description \\ \hline 200 0 & NULL segment \\ 201 1 & Kernel code segment \\ 202 2 & Kernel stack segment \\ 203 3 & User stack segment \\ 204 4 & User code segment \\ 205 5 & Task state segment \\ 206 6 & Task state segment (cont.) \\ 207 7 & Local descriptor table \\ 208 8 & Local descriptor table (cont.) \\ 209\end{tabular} 210 211\section{Interrupts and Exceptions} 212 213The initial (Interrupt Descriptor Table) IDT is set-up by 214\fnname{setup\_default\_idt} in \pathname{irq.c}. The number of entries in the 215IDT is set to 256 entries which are initialized in the following way: 216 217\begin{tabular}{c|l} 218 Index & Description \\ \hline 219 0 & Divide Error \\ 220 1 & Debug \\ 221 2 & Nonmaskable External Interrupt \\ 222 3 & Breakpoint \\ 223 4 & Overflow \\ 224 5 & Bound Range Exceeded \\ 225 6 & Undefined/Invalid Opcode \\ 226 7 & No Math Coprocessor \\ 227 8 & Double Fault \\ 228 9 & Coprocessor Segment Overrun \\ 229 10 & Invalid TSS \\ 230 11 & Segment Not Present \\ 231 12 & Stack Segment Fault \\ 232 13 & General Protection Fault \\ 233 14 & Page Fault \\ 234 15 & Unused \\ 235 16 & FPU Floating-Point Error \\ 236 17 & Alignment Check \\ 237 18 & Machine Check \\ 238 19 & SIMD Floating-Point Exception \\ 239 \hline 240 32 & \multirow{3}{*}{PIC Interrupts} \\ 241 \vdots{} & \\ 242 47 & \\ 243 \hline 244 48 & \multirow{3}{*}{Generic Interrupts} \\ 245 \vdots{} & \\ 246 61 & \\ 247 \hline 248 62 & Tracing IPI \\ 249 63 & Tracing IPI \\ 250 \hline 251 64 & \multirow{3}{*}{Unused} \\ 252 \vdots{} & \\ 253 247 & \\ 254 \hline 255 248 & Halt IPI (Stopping a core) \\ 256 249 & Inter core vector (IPI notifications) \\ 257 250 & APIC Timer \\ 258 251 & APIC Thermal \\ 259 252 & APIC Performance monitoring interrupt \\ 260 253 & APIC Error \\ 261 254 & APIC Spurious interrupt \\ 262 255 & Unused \\ 263\end{tabular} 264 265The lower 32 interrupts are reserved as CPU exceptions. Except for a 266double fault exception, which is always handled by the kernel 267directly, an exception is forwarded to the dispatcher handling the 268domain on the CPU on which it appeared. 269 270Page faults (interrupt 14) are dispatched to the `pagefault` entry 271point of the dispatcher. All other exceptions are dispatched to the 272`trap` entry point of the dispatcher. 273 274There are 224 hardware interrupts, ranging from IRQ number 32 to 255. 275The kernel delivers an interrupt that is not an exception and not 276the local APIC timer interrupt to user-space. The local APIC timer 277interrupt is used by the kernel for preemptive scheduling and not 278delivered to user-space. 279 280Unused entries will be initialized by a special handler function. The slots 281reserved for generic interrupts can be allocated by user-space applications. 282 283\section{Local Descriptor Table (LDT)} 284 285The local descriptor table segment in the GDT will 286initially point to NULL as no LDT is installed. 287 288User-space applications can install their own LDT table 289which is loaded on context-switching using the 290\fnname{maybe\_reload\_ldt} function. 291 292 293\section{Registers} 294 295\paragraph{Segment registers} 296 297Segment registers are initialized by the \fnname{gdt\_reset} function during start-up and each of them points to a GDT entry (index of the GDT table slot for each segment is given in brackets). 298 299\begin{itemize} 300\item[cs] Kernel code segment (1) 301\item[ds] NULL segment (0) 302\item[es] NULL segment (0) 303\item[fs] NULL segment (0) 304\item[gs] NULL segment (0) 305\item[ss] Kernel stack segment (2) 306\end{itemize} 307 308We also note that the \keywname{fs} and \keywname{gs} segment registers are 309preserved and restored across context switches. 310 311\paragraph{General purpose registers} 312\begin{itemize} 313\item \keywname{rcx} contains the start address when running a dispatcher 314for the first time. 315\end{itemize} 316 317\todo{Floating point / SIMD} 318\todo{Machine specific registers (MSR)} 319 320\section{Hardware devices} 321 322\subsection{Serial port} 323On x86, the serial device (a PC16550 compatible controller) is initialized for the first time by the BSP core on boot-up. 324 325By default serial port \varname{0x3f8} will be used, but the port can be changed by 326using a command line argument supplied to the kernel. 327 328Notable settings for the serial driver are: 329\begin{itemize} 330 \item Interrupts are disabled. 331 \item FIFOs are enabled. 332 \item No stop bit. 333 \item 8 data bits. 334 \item No parity bit. 335 \item BAUD rate is 115200. 336\end{itemize} 337 338The serial device is later re-initialized into a different state once the 339serial driver takes over the device. For example, interrupts will then be 340enabled and handled by the driver. 341 342\subsection{PIC -- Programmable Interrupt Controller} 343\todo{describe} 344 345\subsection{xAPIC -- Advanced Programmable Interrupt Controller} 346\todo{describe} 347 348\subsection{System call API} 349This section describe the architectural system calls that are not 350common with other architectures. 351 352\begin{itemize} 353\item[7] SYSCALL\_X86\_FPU\_TRAP\_ON: Turn FPU trap on (x86) 354\item[8] SYSCALL\_X86\_RELOAD\_LDT: Reload the LDT register (x86\_64) 355\end{itemize} 356 357 358%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 359\bibliographystyle{abbrv} 360\bibliography{barrelfish} 361 362\end{document} 363