1\section{Introduction}
2
3\newcommand{\libahci}{\lstinline+libahci+\xspace}
4
5\subsection{Purpose}
6
7The intent behind \libahci is to provide an easy-to-use low-level interface to
8a single \ac{ahci} port. The main reason why such a library is desirable is to
9be able to send arbitrary \ac{ata} commands via \ac{ahci} without having to
10bother with the \ac{ahci} specification details.
11
12\subsection{Design}
13
14\libahci abstracts the low-level \ac{ahci} operations such as the writing to
15memory mapped control registers of the \ac{hba}. It exposes an interface
16similar to that of Flounder-generated interfaces to offer a familiar
17environment for Barrelfish developers.  The library is also used for the
18\ac{ahci} specific layer of the Flounder \ac{ahci} backend. It acts as a
19central point for interfacing \ac{ahci} controllers.
20
21Apart from handling the sending of \ac{ahci} formatted \ac{ata} messages,
22\libahci also provides memory management for \acs{dma} regions.
23
24\section{DMA Buffer Pool}
25
26As all data transfers with \ac{ahci} as transport are done via \acs{dma}, we
27need a mechanism to manage data buffers that are mapped non-cached. Because
28Barrelfish does not have memory reclamation for raw frame allocation, we must
29manage these buffers ourselves and have therefore implemented our own memory
30subsystem in the form of a \acs{dma} buffer pool, which allows for \acs{dma}
31buffer allocation and freeing.
32
33The user has to call \lstinline+ahci_dma_pool_init+ to initialize the \acs{dma}
34buffer pool. After that, calls to \lstinline+ahci_dma_region_alloc+ and
35\lstinline+ahci_dma_region_alloc_aligned+ allocate buffers of the given size
36rounded up to 512 bytes, and the latter aligns the base address such that {\tt
37base \% alignment\_requirement == 0}. \lstinline+ahci_dma_region_free+ returns
38the region it gets passed to the pool.
39
40Additionally the buffer pool provides helper functions that facilitate copying
41data in and out of a buffer (\lstinline+ahci_dma_region_copy_in+ and
42\lstinline+ahci_dma_region_copy_out+).
43
44\begin{center}
45\begin{minipage}{54mm}
46\begin{lstlisting}[caption={DMA region handle},label=code:reghandle]
47struct ahci_dma_region {
48    void *vaddr;
49    genpaddr_t paddr;
50    size_t size;
51    size_t backing_region;
52};
53\end{lstlisting}
54\end{minipage}
55\end{center}
56
57\begin{figure}[p]
58\centering
59\includegraphics[width=.85\textwidth]{dma_pool_design.pdf}
60\caption{DMA Buffer Pool Design}
61\label{fig:dma_pool_design}
62\end{figure}
63
64\subsection{Design}
65
66The pool memory is organized in regions which are allocated and mapped using
67\linebreak\lstinline+frame_alloc+ and \lstinline+vspace_map_one_frame+
68respectively. The virtual and physical addresses of each of these regions are
69stored in the fields \lstinline+vaddr+ and \lstinline+paddr+ of
70\linebreak\lstinline+struct dma_pool+ (c.f.~\autoref{fig:dma_pool_design}). The
71\acs{dma} buffer pool uses a doubly linked free list for maintaining the free
72chunks of the memory belonging to the pool.  A pointer to the first free chunk
73of each backing region of the pool is stored in the pool metadata. Additionally
74pointers to the first and last free chunk are stored.
75
76When processing an allocation request, the free list is scanned from the front
77for a sufficiently free chunk (first-free policy), which is returned in its
78entirety if it is at most 512 bytes larger than the requested size or split
79otherwise. If the chunk is split, the request is taken from the end of the
80chunk and the beginning of the block is left in the free list. If the entire
81chunk is returned, it is removed from the free list and the appropriate
82metadata pointers (\lstinline+first_free+, \lstinline+last_free+, and
83\lstinline+pool.first_free[backing_region]+) are updated, if necessary.
84
85If there is no block large enough to satisfy the allocation request, the pool
86is grown. This is done in steps of 8 megabytes at a time. Growing the pool
87involves resizing the metadata arrays (\lstinline+virt_addrs+,
88\lstinline+phys_addrs+, and \lstinline+first_free+) and allocating and mapping
89memory for the new backing region.
90
91Returning a block to the pool is similar: using the info in
92\lstinline+pool.first_free+, a suitable point in the free list is found, and
93the block is inserted into the free list. 
94
95\subsection{Implementation}
96
97\lstinline+ahci_dma_region_alloc+ searches through the free list linearly and
98stops at the first free chunk that meets the condition {\tt request\_size <=
99chunk\_size}. If no free chunk meets that condition \lstinline+grow_dma_pool+
100is called to increase the pool size by eight megabytes and the free list
101traversal continues with the new memory regions.  When a sufficiently large
102free chunk is found, \lstinline+get_region+ is called.  That function checks if
103the free chunk will be split or not (a chunk is split if the remaining free
104chunk will be at least 512 bytes), allocates and constructs a
105\lstinline+struct ahci_dma_region+ for the buffer that will be returned,
106including computing the virtual and physical addresses of the buffer, and
107shrinks the free chunk or removes it from the free list (according to the
108chunk-splitting decision).
109
110\lstinline+ahci_dma_pool_init+ calls \lstinline+grow_dma_pool+ with the
111requested initial pool size rounded up to \lstinline+BASE_PAGE_SIZE+.
112
113\lstinline+ahci_dma_region_free+ calls \lstinline+return_region+ on the passed
114\lstinline+struct ahci_dma_region+. That function inserts the region into the
115free list. Inserting the region into the free list can take different forms
116according to the state of the free list before inserting the chunk.
117
118After inserting the newly freed chunk into the free list,
119\lstinline+return_region+ tries to merge the chunk with its predecessor and
120successor in order to prevent excessive fragmentation of the buffer pool
121memory.  After calling \lstinline+return_region+, the
122\lstinline+struct ahci_dma_region+ is freed.
123
124The last two functions (\lstinline+ahci_dma_region_copy_in+ and
125\lstinline+ahci_dma_region_copy_out+) are implemented as
126\lstinline+static inline+ and take a \lstinline+struct ahci_dma_region+, a
127\lstinline+void*+ data
128buffer, a \lstinline+genvaddr_t offset+ (into the \acs{dma} region), and a
129\lstinline+size_t+ size. These functions just calculate the source (for
130\lstinline+ahci_dma_region_copy_out+) or destination (for
131\lstinline+ahci_dma_region_copy_in+) pointer for the memcpy and then copy the
132data.
133
134\newcommand{\issuecmd}{\lstinline+ahci_issue_command+\xspace}
135\section[libahci Interface]{\libahci Interface}
136
137\subsection[ahci\_issue\_command]{\issuecmd}
138
139\issuecmd is the main function of libahci and takes a \lstinline+void*+ tag
140with which the user can later match the command completed messages to his
141issued commands, a \ac{fis} and \ac{fis} length, a boolean flag
142\lstinline+is_write+ which indicates if \acs{dma} takes place to or from the
143disk, and a \lstinline+struct vregion*+ data buffer and associated length.
144
145\newcommand{\setupcmd}{\lstinline+ahci_setup_command+\xspace} First off
146\issuecmd calls \setupcmd which allocates a command slot in the port's command
147header. After that, \setupcmd allocates a command table for the new command
148that has enough entries to accomodate $\lceil
149data\_length\allowbreak/\allowbreak prd\_size\rceil$ \acp{prd}. Then \setupcmd
150inserts the newly allocated command table into the reserved slot in the port's
151command header and sets the bit to indicate the \acs{dma} direction (according
152to \lstinline+is_write+) and also sets the \ac{fis} length in the command
153header slot.  Finally, the \ac{fis} is copied into the newly allocated command
154table and the \lstinline+int *command+ output parameter is assigned the command
155slot number of the new command.
156
157\newcommand{\addprs}{\lstinline+ahci_add_physical_regions+\xspace} After
158completion of \setupcmd, \issuecmd saves the user's tag into the command slot
159metadata and proceeds to call \addprs. This function takes the command slot
160number (\lstinline+int commmand+) and a data buffer, partitions the data buffer
161into physical regions and inserts those regions into the command slot indicated
162by \lstinline+command+. The size of the physical regions is specified as at
163most 4MB and must be an even byte count. However, due to hardware-related
164problems when using physical regions larger than 128kB we artificially cap the
165physical region size at 128kB. Memory addresses have to be word aligned.  If a
166constant and predictable physical region size is desired, one can define
167\lstinline+AHCI_FIXED_PR_SIZE+ and \lstinline+PR_SIZE+ to enforce a specific
168size for physical regions.
169
170Finally \issuecmd sets the issue command bit for the command slot in which the
171new command is stored and calls the user continuation, if any.
172
173\subsection{Command Completed Callback}
174
175The command completed callback is called when the \ac{ahci} management daemon
176receives a interrupt targeted to the \ac{ahci} port which is coupled with the
177associated \lstinline+struct ahci_binding+. The command completed callback can
178be adjusted by user code in order to post-process (cleanup, copy-out of read
179data, etc.) a completed \ac{ahci} command.
180
181The management command completed callback in \libahci (which is called from
182ahcid when the port associated with the current libahci binding receives an
183interrupt) reads the commmand issue register of the port and calls the
184user-supplied command completed callback for each command slot which is marked
185\lstinline+in_use+ in libahci but which has the corresponding bit in the
186command issue register cleared.
187
188The user-supplied command completed callback takes a \lstinline+void *tag+ as
189its only argument; these tags are also saved in libahci, and should uniquely
190identify their correpsonding \ac{ahci} command.
191
192\newcommand{\ahciinit}{\lstinline+ahci_init+\xspace}
193\subsection[ahci\_init]{\lstinline+ahci_init+}
194
195\ahciinit is the first function a user of \libahci calls. \ahciinit initializes
196the \lstinline+struct ahci_binding+ for the connection and if the connection to
197\emph{ahcid} has not yet been established, tries to bind to \emph{ahcid}.  The
198initalization of \libahci continues when the bind callback that was specified
199in the call to \emph{ahcid} executes.
200
201On the first call to \ahciinit, the bind callback sets up the function table
202for the management binding and then calls \lstinline+ahci_mgmt_open_call__tx+
203to request the port specified by the \lstinline+uint8_t port+ parameter of
204\ahciinit from \emph{ahcid}. The initialization finishes when the ahci
205management open callback executes.
206
207On later \ahciinit calls \ahciinit updates the \emph{ahcid} binding to know
208about the new \libahci connection and directly calls
209\lstinline+ahci_mgmt_open_call__tx+.
210
211The open callback checks if the open call succeeded, and if so, the memory
212region containing the registers belonging to the requested port is mapped in
213the address space in which \libahci executes. After that the receive \ac{fis}
214area and the command list are set up, a copy of the \texttt{IDENTIFY} data is
215fetched from \emph{ahcid}, the port is enabled (the \emph{command list running}
216flag is set to one) and all port interrupts are enabled.
217
218\subsection[ahci\_close]{\lstinline+ahci_close+}
219
220The purpose of \lstinline+ahci_close+ is to release the port by calling the
221close function of \emph{ahcid} (c.f.~\autoref{code:ahci_mgmt.if}). This needs
222to be done, as otherwise \emph{ahcid} will return \verb+AHCI_ERR_+
223\verb+PORT_BUSY+ on subsequent open calls for the same port.
224
225\subsection[sata\_fis.h]{\lstinline+sata_fis.h+}
226
227This header contains definitions dealing with \ac{sata}'s \ac{fis} that are
228used for sending commands over \ac{ahci}. While the \ac{ata} command
229specification defines what registers exist for each \ac{fis} type and how they
230are used, the \ac{sata} specification defines the binary layout of these
231registers.
232
233While it might initially seem that a mackerel specification for these
234structures would be sufficient, complexity introduced through optional \ac{ata}
235features makes a custom API preferable. As an example, consider the layout of
23628-bit and 48-bit \acp{lba}: for 28 bit \acp{lba}, the lower 24 bits are placed
237in registers \verb+lba0+ through \verb+lba2+, while the upper 4 bits are placed
238in the low bits of the \verb+device+ register. However, for 48-bit \ac{lba},
239the \verb+device+ register is not used, and the upper 24 bits are placed in
240register \verb+lba3+ through \verb+lba5+, which are separate from the lower 3
241\verb+lba+ registers.
242
243\section{Error Handling}
244
245A mandatory part of an \ac{ahci} driver is to check if the \ac{hba} signals any
246errors on command completion. \libahci does check the relevant registers, but
247the only error handling implemented right now is to dump the registers
248specifying the error and then aborting the domain that received the error.
249
250In order to comply to the \ac{ahci} specification, the software stack (i.e.
251\libahci) should attempt to recover. Errors signaled by one of the \verb+HBFS+,
252\verb+HBDS+, \verb+IFS+ or \verb+TFES+ interrupts are fatal and will cause the
253\ac{hba} to stop processing commmands. To recover from a fatal error, the port
254needs to be restarted and any pending commands have to be re-issued to the
255hardware or user level code has to be notified that these commands failed.
256
257Errors signaled by the \verb+INFS+ or \verb+OFS+ interrupts are not fatal and
258the \ac{hba} continues processing commands. In this case the software stack
259does not have to take any action.
260