1\section{Introduction} 2 3\newcommand{\libahci}{\lstinline+libahci+\xspace} 4 5\subsection{Purpose} 6 7The intent behind \libahci is to provide an easy-to-use low-level interface to 8a single \ac{ahci} port. The main reason why such a library is desirable is to 9be able to send arbitrary \ac{ata} commands via \ac{ahci} without having to 10bother with the \ac{ahci} specification details. 11 12\subsection{Design} 13 14\libahci abstracts the low-level \ac{ahci} operations such as the writing to 15memory mapped control registers of the \ac{hba}. It exposes an interface 16similar to that of Flounder-generated interfaces to offer a familiar 17environment for Barrelfish developers. The library is also used for the 18\ac{ahci} specific layer of the Flounder \ac{ahci} backend. It acts as a 19central point for interfacing \ac{ahci} controllers. 20 21Apart from handling the sending of \ac{ahci} formatted \ac{ata} messages, 22\libahci also provides memory management for \acs{dma} regions. 23 24\section{DMA Buffer Pool} 25 26As all data transfers with \ac{ahci} as transport are done via \acs{dma}, we 27need a mechanism to manage data buffers that are mapped non-cached. Because 28Barrelfish does not have memory reclamation for raw frame allocation, we must 29manage these buffers ourselves and have therefore implemented our own memory 30subsystem in the form of a \acs{dma} buffer pool, which allows for \acs{dma} 31buffer allocation and freeing. 32 33The user has to call \lstinline+ahci_dma_pool_init+ to initialize the \acs{dma} 34buffer pool. After that, calls to \lstinline+ahci_dma_region_alloc+ and 35\lstinline+ahci_dma_region_alloc_aligned+ allocate buffers of the given size 36rounded up to 512 bytes, and the latter aligns the base address such that {\tt 37base \% alignment\_requirement == 0}. \lstinline+ahci_dma_region_free+ returns 38the region it gets passed to the pool. 39 40Additionally the buffer pool provides helper functions that facilitate copying 41data in and out of a buffer (\lstinline+ahci_dma_region_copy_in+ and 42\lstinline+ahci_dma_region_copy_out+). 43 44\begin{center} 45\begin{minipage}{54mm} 46\begin{lstlisting}[caption={DMA region handle},label=code:reghandle] 47struct ahci_dma_region { 48 void *vaddr; 49 genpaddr_t paddr; 50 size_t size; 51 size_t backing_region; 52}; 53\end{lstlisting} 54\end{minipage} 55\end{center} 56 57\begin{figure}[p] 58\centering 59\includegraphics[width=.85\textwidth]{dma_pool_design.pdf} 60\caption{DMA Buffer Pool Design} 61\label{fig:dma_pool_design} 62\end{figure} 63 64\subsection{Design} 65 66The pool memory is organized in regions which are allocated and mapped using 67\linebreak\lstinline+frame_alloc+ and \lstinline+vspace_map_one_frame+ 68respectively. The virtual and physical addresses of each of these regions are 69stored in the fields \lstinline+vaddr+ and \lstinline+paddr+ of 70\linebreak\lstinline+struct dma_pool+ (c.f.~\autoref{fig:dma_pool_design}). The 71\acs{dma} buffer pool uses a doubly linked free list for maintaining the free 72chunks of the memory belonging to the pool. A pointer to the first free chunk 73of each backing region of the pool is stored in the pool metadata. Additionally 74pointers to the first and last free chunk are stored. 75 76When processing an allocation request, the free list is scanned from the front 77for a sufficiently free chunk (first-free policy), which is returned in its 78entirety if it is at most 512 bytes larger than the requested size or split 79otherwise. If the chunk is split, the request is taken from the end of the 80chunk and the beginning of the block is left in the free list. If the entire 81chunk is returned, it is removed from the free list and the appropriate 82metadata pointers (\lstinline+first_free+, \lstinline+last_free+, and 83\lstinline+pool.first_free[backing_region]+) are updated, if necessary. 84 85If there is no block large enough to satisfy the allocation request, the pool 86is grown. This is done in steps of 8 megabytes at a time. Growing the pool 87involves resizing the metadata arrays (\lstinline+virt_addrs+, 88\lstinline+phys_addrs+, and \lstinline+first_free+) and allocating and mapping 89memory for the new backing region. 90 91Returning a block to the pool is similar: using the info in 92\lstinline+pool.first_free+, a suitable point in the free list is found, and 93the block is inserted into the free list. 94 95\subsection{Implementation} 96 97\lstinline+ahci_dma_region_alloc+ searches through the free list linearly and 98stops at the first free chunk that meets the condition {\tt request\_size <= 99chunk\_size}. If no free chunk meets that condition \lstinline+grow_dma_pool+ 100is called to increase the pool size by eight megabytes and the free list 101traversal continues with the new memory regions. When a sufficiently large 102free chunk is found, \lstinline+get_region+ is called. That function checks if 103the free chunk will be split or not (a chunk is split if the remaining free 104chunk will be at least 512 bytes), allocates and constructs a 105\lstinline+struct ahci_dma_region+ for the buffer that will be returned, 106including computing the virtual and physical addresses of the buffer, and 107shrinks the free chunk or removes it from the free list (according to the 108chunk-splitting decision). 109 110\lstinline+ahci_dma_pool_init+ calls \lstinline+grow_dma_pool+ with the 111requested initial pool size rounded up to \lstinline+BASE_PAGE_SIZE+. 112 113\lstinline+ahci_dma_region_free+ calls \lstinline+return_region+ on the passed 114\lstinline+struct ahci_dma_region+. That function inserts the region into the 115free list. Inserting the region into the free list can take different forms 116according to the state of the free list before inserting the chunk. 117 118After inserting the newly freed chunk into the free list, 119\lstinline+return_region+ tries to merge the chunk with its predecessor and 120successor in order to prevent excessive fragmentation of the buffer pool 121memory. After calling \lstinline+return_region+, the 122\lstinline+struct ahci_dma_region+ is freed. 123 124The last two functions (\lstinline+ahci_dma_region_copy_in+ and 125\lstinline+ahci_dma_region_copy_out+) are implemented as 126\lstinline+static inline+ and take a \lstinline+struct ahci_dma_region+, a 127\lstinline+void*+ data 128buffer, a \lstinline+genvaddr_t offset+ (into the \acs{dma} region), and a 129\lstinline+size_t+ size. These functions just calculate the source (for 130\lstinline+ahci_dma_region_copy_out+) or destination (for 131\lstinline+ahci_dma_region_copy_in+) pointer for the memcpy and then copy the 132data. 133 134\newcommand{\issuecmd}{\lstinline+ahci_issue_command+\xspace} 135\section[libahci Interface]{\libahci Interface} 136 137\subsection[ahci\_issue\_command]{\issuecmd} 138 139\issuecmd is the main function of libahci and takes a \lstinline+void*+ tag 140with which the user can later match the command completed messages to his 141issued commands, a \ac{fis} and \ac{fis} length, a boolean flag 142\lstinline+is_write+ which indicates if \acs{dma} takes place to or from the 143disk, and a \lstinline+struct vregion*+ data buffer and associated length. 144 145\newcommand{\setupcmd}{\lstinline+ahci_setup_command+\xspace} First off 146\issuecmd calls \setupcmd which allocates a command slot in the port's command 147header. After that, \setupcmd allocates a command table for the new command 148that has enough entries to accomodate $\lceil 149data\_length\allowbreak/\allowbreak prd\_size\rceil$ \acp{prd}. Then \setupcmd 150inserts the newly allocated command table into the reserved slot in the port's 151command header and sets the bit to indicate the \acs{dma} direction (according 152to \lstinline+is_write+) and also sets the \ac{fis} length in the command 153header slot. Finally, the \ac{fis} is copied into the newly allocated command 154table and the \lstinline+int *command+ output parameter is assigned the command 155slot number of the new command. 156 157\newcommand{\addprs}{\lstinline+ahci_add_physical_regions+\xspace} After 158completion of \setupcmd, \issuecmd saves the user's tag into the command slot 159metadata and proceeds to call \addprs. This function takes the command slot 160number (\lstinline+int commmand+) and a data buffer, partitions the data buffer 161into physical regions and inserts those regions into the command slot indicated 162by \lstinline+command+. The size of the physical regions is specified as at 163most 4MB and must be an even byte count. However, due to hardware-related 164problems when using physical regions larger than 128kB we artificially cap the 165physical region size at 128kB. Memory addresses have to be word aligned. If a 166constant and predictable physical region size is desired, one can define 167\lstinline+AHCI_FIXED_PR_SIZE+ and \lstinline+PR_SIZE+ to enforce a specific 168size for physical regions. 169 170Finally \issuecmd sets the issue command bit for the command slot in which the 171new command is stored and calls the user continuation, if any. 172 173\subsection{Command Completed Callback} 174 175The command completed callback is called when the \ac{ahci} management daemon 176receives a interrupt targeted to the \ac{ahci} port which is coupled with the 177associated \lstinline+struct ahci_binding+. The command completed callback can 178be adjusted by user code in order to post-process (cleanup, copy-out of read 179data, etc.) a completed \ac{ahci} command. 180 181The management command completed callback in \libahci (which is called from 182ahcid when the port associated with the current libahci binding receives an 183interrupt) reads the commmand issue register of the port and calls the 184user-supplied command completed callback for each command slot which is marked 185\lstinline+in_use+ in libahci but which has the corresponding bit in the 186command issue register cleared. 187 188The user-supplied command completed callback takes a \lstinline+void *tag+ as 189its only argument; these tags are also saved in libahci, and should uniquely 190identify their correpsonding \ac{ahci} command. 191 192\newcommand{\ahciinit}{\lstinline+ahci_init+\xspace} 193\subsection[ahci\_init]{\lstinline+ahci_init+} 194 195\ahciinit is the first function a user of \libahci calls. \ahciinit initializes 196the \lstinline+struct ahci_binding+ for the connection and if the connection to 197\emph{ahcid} has not yet been established, tries to bind to \emph{ahcid}. The 198initalization of \libahci continues when the bind callback that was specified 199in the call to \emph{ahcid} executes. 200 201On the first call to \ahciinit, the bind callback sets up the function table 202for the management binding and then calls \lstinline+ahci_mgmt_open_call__tx+ 203to request the port specified by the \lstinline+uint8_t port+ parameter of 204\ahciinit from \emph{ahcid}. The initialization finishes when the ahci 205management open callback executes. 206 207On later \ahciinit calls \ahciinit updates the \emph{ahcid} binding to know 208about the new \libahci connection and directly calls 209\lstinline+ahci_mgmt_open_call__tx+. 210 211The open callback checks if the open call succeeded, and if so, the memory 212region containing the registers belonging to the requested port is mapped in 213the address space in which \libahci executes. After that the receive \ac{fis} 214area and the command list are set up, a copy of the \texttt{IDENTIFY} data is 215fetched from \emph{ahcid}, the port is enabled (the \emph{command list running} 216flag is set to one) and all port interrupts are enabled. 217 218\subsection[ahci\_close]{\lstinline+ahci_close+} 219 220The purpose of \lstinline+ahci_close+ is to release the port by calling the 221close function of \emph{ahcid} (c.f.~\autoref{code:ahci_mgmt.if}). This needs 222to be done, as otherwise \emph{ahcid} will return \verb+AHCI_ERR_+ 223\verb+PORT_BUSY+ on subsequent open calls for the same port. 224 225\subsection[sata\_fis.h]{\lstinline+sata_fis.h+} 226 227This header contains definitions dealing with \ac{sata}'s \ac{fis} that are 228used for sending commands over \ac{ahci}. While the \ac{ata} command 229specification defines what registers exist for each \ac{fis} type and how they 230are used, the \ac{sata} specification defines the binary layout of these 231registers. 232 233While it might initially seem that a mackerel specification for these 234structures would be sufficient, complexity introduced through optional \ac{ata} 235features makes a custom API preferable. As an example, consider the layout of 23628-bit and 48-bit \acp{lba}: for 28 bit \acp{lba}, the lower 24 bits are placed 237in registers \verb+lba0+ through \verb+lba2+, while the upper 4 bits are placed 238in the low bits of the \verb+device+ register. However, for 48-bit \ac{lba}, 239the \verb+device+ register is not used, and the upper 24 bits are placed in 240register \verb+lba3+ through \verb+lba5+, which are separate from the lower 3 241\verb+lba+ registers. 242 243\section{Error Handling} 244 245A mandatory part of an \ac{ahci} driver is to check if the \ac{hba} signals any 246errors on command completion. \libahci does check the relevant registers, but 247the only error handling implemented right now is to dump the registers 248specifying the error and then aborting the domain that received the error. 249 250In order to comply to the \ac{ahci} specification, the software stack (i.e. 251\libahci) should attempt to recover. Errors signaled by one of the \verb+HBFS+, 252\verb+HBDS+, \verb+IFS+ or \verb+TFES+ interrupts are fatal and will cause the 253\ac{hba} to stop processing commmands. To recover from a fatal error, the port 254needs to be restarted and any pending commands have to be re-issued to the 255hardware or user level code has to be notified that these commands failed. 256 257Errors signaled by the \verb+INFS+ or \verb+OFS+ interrupts are not fatal and 258the \ac{hba} continues processing commands. In this case the software stack 259does not have to take any action. 260