\section{Introduction} \newcommand{\libahci}{\lstinline+libahci+\xspace} \subsection{Purpose} The intent behind \libahci is to provide an easy-to-use low-level interface to a single \ac{ahci} port. The main reason why such a library is desirable is to be able to send arbitrary \ac{ata} commands via \ac{ahci} without having to bother with the \ac{ahci} specification details. \subsection{Design} \libahci abstracts the low-level \ac{ahci} operations such as the writing to memory mapped control registers of the \ac{hba}. It exposes an interface similar to that of Flounder-generated interfaces to offer a familiar environment for Barrelfish developers. The library is also used for the \ac{ahci} specific layer of the Flounder \ac{ahci} backend. It acts as a central point for interfacing \ac{ahci} controllers. Apart from handling the sending of \ac{ahci} formatted \ac{ata} messages, \libahci also provides memory management for \acs{dma} regions. \section{DMA Buffer Pool} As all data transfers with \ac{ahci} as transport are done via \acs{dma}, we need a mechanism to manage data buffers that are mapped non-cached. Because Barrelfish does not have memory reclamation for raw frame allocation, we must manage these buffers ourselves and have therefore implemented our own memory subsystem in the form of a \acs{dma} buffer pool, which allows for \acs{dma} buffer allocation and freeing. The user has to call \lstinline+ahci_dma_pool_init+ to initialize the \acs{dma} buffer pool. After that, calls to \lstinline+ahci_dma_region_alloc+ and \lstinline+ahci_dma_region_alloc_aligned+ allocate buffers of the given size rounded up to 512 bytes, and the latter aligns the base address such that {\tt base \% alignment\_requirement == 0}. \lstinline+ahci_dma_region_free+ returns the region it gets passed to the pool. Additionally the buffer pool provides helper functions that facilitate copying data in and out of a buffer (\lstinline+ahci_dma_region_copy_in+ and \lstinline+ahci_dma_region_copy_out+). \begin{center} \begin{minipage}{54mm} \begin{lstlisting}[caption={DMA region handle},label=code:reghandle] struct ahci_dma_region { void *vaddr; genpaddr_t paddr; size_t size; size_t backing_region; }; \end{lstlisting} \end{minipage} \end{center} \begin{figure}[p] \centering \includegraphics[width=.85\textwidth]{dma_pool_design.pdf} \caption{DMA Buffer Pool Design} \label{fig:dma_pool_design} \end{figure} \subsection{Design} The pool memory is organized in regions which are allocated and mapped using \linebreak\lstinline+frame_alloc+ and \lstinline+vspace_map_one_frame+ respectively. The virtual and physical addresses of each of these regions are stored in the fields \lstinline+vaddr+ and \lstinline+paddr+ of \linebreak\lstinline+struct dma_pool+ (c.f.~\autoref{fig:dma_pool_design}). The \acs{dma} buffer pool uses a doubly linked free list for maintaining the free chunks of the memory belonging to the pool. A pointer to the first free chunk of each backing region of the pool is stored in the pool metadata. Additionally pointers to the first and last free chunk are stored. When processing an allocation request, the free list is scanned from the front for a sufficiently free chunk (first-free policy), which is returned in its entirety if it is at most 512 bytes larger than the requested size or split otherwise. If the chunk is split, the request is taken from the end of the chunk and the beginning of the block is left in the free list. If the entire chunk is returned, it is removed from the free list and the appropriate metadata pointers (\lstinline+first_free+, \lstinline+last_free+, and \lstinline+pool.first_free[backing_region]+) are updated, if necessary. If there is no block large enough to satisfy the allocation request, the pool is grown. This is done in steps of 8 megabytes at a time. Growing the pool involves resizing the metadata arrays (\lstinline+virt_addrs+, \lstinline+phys_addrs+, and \lstinline+first_free+) and allocating and mapping memory for the new backing region. Returning a block to the pool is similar: using the info in \lstinline+pool.first_free+, a suitable point in the free list is found, and the block is inserted into the free list. \subsection{Implementation} \lstinline+ahci_dma_region_alloc+ searches through the free list linearly and stops at the first free chunk that meets the condition {\tt request\_size <= chunk\_size}. If no free chunk meets that condition \lstinline+grow_dma_pool+ is called to increase the pool size by eight megabytes and the free list traversal continues with the new memory regions. When a sufficiently large free chunk is found, \lstinline+get_region+ is called. That function checks if the free chunk will be split or not (a chunk is split if the remaining free chunk will be at least 512 bytes), allocates and constructs a \lstinline+struct ahci_dma_region+ for the buffer that will be returned, including computing the virtual and physical addresses of the buffer, and shrinks the free chunk or removes it from the free list (according to the chunk-splitting decision). \lstinline+ahci_dma_pool_init+ calls \lstinline+grow_dma_pool+ with the requested initial pool size rounded up to \lstinline+BASE_PAGE_SIZE+. \lstinline+ahci_dma_region_free+ calls \lstinline+return_region+ on the passed \lstinline+struct ahci_dma_region+. That function inserts the region into the free list. Inserting the region into the free list can take different forms according to the state of the free list before inserting the chunk. After inserting the newly freed chunk into the free list, \lstinline+return_region+ tries to merge the chunk with its predecessor and successor in order to prevent excessive fragmentation of the buffer pool memory. After calling \lstinline+return_region+, the \lstinline+struct ahci_dma_region+ is freed. The last two functions (\lstinline+ahci_dma_region_copy_in+ and \lstinline+ahci_dma_region_copy_out+) are implemented as \lstinline+static inline+ and take a \lstinline+struct ahci_dma_region+, a \lstinline+void*+ data buffer, a \lstinline+genvaddr_t offset+ (into the \acs{dma} region), and a \lstinline+size_t+ size. These functions just calculate the source (for \lstinline+ahci_dma_region_copy_out+) or destination (for \lstinline+ahci_dma_region_copy_in+) pointer for the memcpy and then copy the data. \newcommand{\issuecmd}{\lstinline+ahci_issue_command+\xspace} \section[libahci Interface]{\libahci Interface} \subsection[ahci\_issue\_command]{\issuecmd} \issuecmd is the main function of libahci and takes a \lstinline+void*+ tag with which the user can later match the command completed messages to his issued commands, a \ac{fis} and \ac{fis} length, a boolean flag \lstinline+is_write+ which indicates if \acs{dma} takes place to or from the disk, and a \lstinline+struct vregion*+ data buffer and associated length. \newcommand{\setupcmd}{\lstinline+ahci_setup_command+\xspace} First off \issuecmd calls \setupcmd which allocates a command slot in the port's command header. After that, \setupcmd allocates a command table for the new command that has enough entries to accomodate $\lceil data\_length\allowbreak/\allowbreak prd\_size\rceil$ \acp{prd}. Then \setupcmd inserts the newly allocated command table into the reserved slot in the port's command header and sets the bit to indicate the \acs{dma} direction (according to \lstinline+is_write+) and also sets the \ac{fis} length in the command header slot. Finally, the \ac{fis} is copied into the newly allocated command table and the \lstinline+int *command+ output parameter is assigned the command slot number of the new command. \newcommand{\addprs}{\lstinline+ahci_add_physical_regions+\xspace} After completion of \setupcmd, \issuecmd saves the user's tag into the command slot metadata and proceeds to call \addprs. This function takes the command slot number (\lstinline+int commmand+) and a data buffer, partitions the data buffer into physical regions and inserts those regions into the command slot indicated by \lstinline+command+. The size of the physical regions is specified as at most 4MB and must be an even byte count. However, due to hardware-related problems when using physical regions larger than 128kB we artificially cap the physical region size at 128kB. Memory addresses have to be word aligned. If a constant and predictable physical region size is desired, one can define \lstinline+AHCI_FIXED_PR_SIZE+ and \lstinline+PR_SIZE+ to enforce a specific size for physical regions. Finally \issuecmd sets the issue command bit for the command slot in which the new command is stored and calls the user continuation, if any. \subsection{Command Completed Callback} The command completed callback is called when the \ac{ahci} management daemon receives a interrupt targeted to the \ac{ahci} port which is coupled with the associated \lstinline+struct ahci_binding+. The command completed callback can be adjusted by user code in order to post-process (cleanup, copy-out of read data, etc.) a completed \ac{ahci} command. The management command completed callback in \libahci (which is called from ahcid when the port associated with the current libahci binding receives an interrupt) reads the commmand issue register of the port and calls the user-supplied command completed callback for each command slot which is marked \lstinline+in_use+ in libahci but which has the corresponding bit in the command issue register cleared. The user-supplied command completed callback takes a \lstinline+void *tag+ as its only argument; these tags are also saved in libahci, and should uniquely identify their correpsonding \ac{ahci} command. \newcommand{\ahciinit}{\lstinline+ahci_init+\xspace} \subsection[ahci\_init]{\lstinline+ahci_init+} \ahciinit is the first function a user of \libahci calls. \ahciinit initializes the \lstinline+struct ahci_binding+ for the connection and if the connection to \emph{ahcid} has not yet been established, tries to bind to \emph{ahcid}. The initalization of \libahci continues when the bind callback that was specified in the call to \emph{ahcid} executes. On the first call to \ahciinit, the bind callback sets up the function table for the management binding and then calls \lstinline+ahci_mgmt_open_call__tx+ to request the port specified by the \lstinline+uint8_t port+ parameter of \ahciinit from \emph{ahcid}. The initialization finishes when the ahci management open callback executes. On later \ahciinit calls \ahciinit updates the \emph{ahcid} binding to know about the new \libahci connection and directly calls \lstinline+ahci_mgmt_open_call__tx+. The open callback checks if the open call succeeded, and if so, the memory region containing the registers belonging to the requested port is mapped in the address space in which \libahci executes. After that the receive \ac{fis} area and the command list are set up, a copy of the \texttt{IDENTIFY} data is fetched from \emph{ahcid}, the port is enabled (the \emph{command list running} flag is set to one) and all port interrupts are enabled. \subsection[ahci\_close]{\lstinline+ahci_close+} The purpose of \lstinline+ahci_close+ is to release the port by calling the close function of \emph{ahcid} (c.f.~\autoref{code:ahci_mgmt.if}). This needs to be done, as otherwise \emph{ahcid} will return \verb+AHCI_ERR_+ \verb+PORT_BUSY+ on subsequent open calls for the same port. \subsection[sata\_fis.h]{\lstinline+sata_fis.h+} This header contains definitions dealing with \ac{sata}'s \ac{fis} that are used for sending commands over \ac{ahci}. While the \ac{ata} command specification defines what registers exist for each \ac{fis} type and how they are used, the \ac{sata} specification defines the binary layout of these registers. While it might initially seem that a mackerel specification for these structures would be sufficient, complexity introduced through optional \ac{ata} features makes a custom API preferable. As an example, consider the layout of 28-bit and 48-bit \acp{lba}: for 28 bit \acp{lba}, the lower 24 bits are placed in registers \verb+lba0+ through \verb+lba2+, while the upper 4 bits are placed in the low bits of the \verb+device+ register. However, for 48-bit \ac{lba}, the \verb+device+ register is not used, and the upper 24 bits are placed in register \verb+lba3+ through \verb+lba5+, which are separate from the lower 3 \verb+lba+ registers. \section{Error Handling} A mandatory part of an \ac{ahci} driver is to check if the \ac{hba} signals any errors on command completion. \libahci does check the relevant registers, but the only error handling implemented right now is to dump the registers specifying the error and then aborting the domain that received the error. In order to comply to the \ac{ahci} specification, the software stack (i.e. \libahci) should attempt to recover. Errors signaled by one of the \verb+HBFS+, \verb+HBDS+, \verb+IFS+ or \verb+TFES+ interrupts are fatal and will cause the \ac{hba} to stop processing commmands. To recover from a fatal error, the port needs to be restarted and any pending commands have to be re-issued to the hardware or user level code has to be notified that these commands failed. Errors signaled by the \verb+INFS+ or \verb+OFS+ interrupts are not fatal and the \ac{hba} continues processing commands. In this case the software stack does not have to take any action.