1===================================================================================== 2Intel VT-d (Virtualization Technology for Directed I/O) driver Overview 3===================================================================================== 4 5An overview of Intel Virtualization Technology for Directed I/O can be found on pages 614-19 in the architecture specification (see DOCUMENTATION below). 7 8In the specification and this README, a domain is defined as an abstract isolated 9environment that consists of a subset of the host physical memory. In the case of 10Arrakis, each domain generally corresponds to an application. 11 12===================================================================================== 13 14RELEVANT FILES 15 16Contains errors that may be returned when creating/destroying domains and 17adding/removing devices 18errors/errno.fugu 19 20Added an x86-64 VNode identity syscall and VNode identity structure 21kernel/arch/x86_64/syscall.c 22include/barrelfish_kpi/capabilities.h 23include/arch/x86_64/barrelfish/invocations_arch.h 24 25Added reference to the root PML4 VNode capability 26lib/barrelfish/capabilities.c 27include/barrelfish/caddr.h 28 29Modified the x86-64 page table entries by adding a vtd_snoop field. Also, changed 30paging_x86_64_map_large() and paging_x86_64_map() to set the vtd_snoop field. 31include/target/x86_64/barrelfish_kpi/paging_target.h 32kernel/include/target/x86_64/paging_kernel_target.h 33 34Defined the constant VREGION_FLAGS_VTD_SNOOP, which allows the vtd_snoop field to be 35set from user-space. This constant is bitwise OR'd with the primary flags used in 36user-mapping, such as VREGION_FLAGS_READ_WRITE. 37include/barrelfish/vregion.h 38 39Added RPC calls allowing applications to use the VT-d 40if/acpi.if 41include/acpi_client/acpi_client.h 42lib/acpi_client/acpi_client.c 43usr/acpi/acpi_service.c 44 45Changes needed to add a call to the VT-d initialization function. 46usr/acpi/acpi.c 47usr/acpi/Hakefile 48 49Added a commandline option to not allow translation through the VT-d 50usr/acpi/acpi_main.c 51usr/acpi/acpi_shared.h 52 53Debug printer and its power switch. 54usr/acpi/vtd_debug.h 55 56This header contains the functions and structures used for second-level translation. 57Only used to establish the identity pagetable, but they may also be used to establish 58arbitrary mappings 59usr/acpi/vtd_sl_paging.h 60 61Where the VT-d driver implementation is contained 62usr/acpi/intel_vtd.h 63usr/acpi/intel_vtd.c 64usr/acpi/vtd_domains.h 65 66Added an ACPI-enumerated device declaration structure type 67usr/acpi/acpica/include/actbl2.h 68 69Mackerel specifications for the remapping hardware register set for each hardware 70unit. The offset to the IOTLB registers is found in one of the remapping registers, 71so two separate specifications were created. Another one will need to created for 72the set of fault registers. 73devices/vtd.dev 74devices/vtd_iotlb.dev 75devices/Hakefile 76 77Where we add the remaining devices to the identity domain 78usr/pci/pcimain.c 79 80Contains queries for finding devices to add to the identity domain 81usr/skb/programs/pci_queries.pl 82 83Changes made to support DMA remapping through the VT-d for applications using the 84network stack and the e10k driver: 85if/e10k.if 86usr/drivers/e10k/e10k_qdriver.c 87usr/drivers/e10k/e10k_cdriver.c 88usr/drivers/e10k/e10k_vf.c 89usr/drivers/e10k/Hakefile 90lib/arranet/arranet.c 91lib/arranet/Hakefile 92 93===================================================================================== 94 95IMPLEMENTATION 96 97The VT-d driver is currently coupled with the ACPI daemon. 98 99Domains and domain-ids are manages by a sorted and bounded doubly-linked list. 100 101Remapping hardware units are managed by a simple linked list and use application 102pagetables for translation (except for the identity domain). As a result, applications 103may be required to flush the processor caches (more on this below). 104 105At the very end of the intialization function in acpi.c, we make a call to vtd_init 106(implemented in intel_vtd.c), where we parse the DMAR table, which is comprised of 107remapping structures that contain devices under their scope. While parsing, we create 108remapping unit structures and report all devices we find to the SKB. After we have 109completed the task of parsing the DMAR table, we then construct the identity domain. 110 111Since each application corresponds to a domain, we want each domain to have access to 112all devices. Hence, we establish minimum and maximum domain-id bounds across all 113units. We finally execute a query to retrieve and add all applicable devices 114explicitly found in the remapping structures to the identity domain. Devices that are 115applicable to be inserted into the identity domain are all PCIe devices/bridges and 116PCI devices (no bridges) that reside on the root bus. The reason for the later is 117because PCI devices behind the same bus share the same source-id on transactions. 118This also implies that all devices that reside behind the same PCI bridge must be 119contained in the same domain. 120 121Devices that are found during PCI bus enumeration are reported to the SKB in the PCI 122daemon. After PCI device discovery is complete, a RPC call is made to the ACPI daemon 123to extract these devices from the SKB and add them to the identity domain. However, 124since drivers are initiated by Kaluga during device discovery, it is possible for a 125device to be added before this call occurs. 126 127For each hardware unit, after all of this occurs, we set the root table, enable DMA 128remapping, and report to the SKB which segment translation is enabled on and if 129flushing is required for that segment. 130 131If remapping hardware units are present on the platform, translation is enabled by 132default. However, the commandline option "vtd_force_off" can be supplied to the ACPI 133daemon in order to not enable translation on any hardware unit. 134 135An application that wishes to use the VT-d then does the following (excluding any 136other changes such as using virtual addresses in place of physical ones): 137 138(1) Executes a query to determine if translation is enabled for specific segments and 139 if flushing the CPU caches is required for those segments. 140 141(2) Constructs a domain using its root PML4 as an argument. A reference to it can be 142 found in the list of well-known capabilities in include/barrelfish/caddr.h. It is 143 identified as cap_vroot. 144 145(3) The application then simply adds devices to the constructed domain using that 146 PMl4 as one of the arguments to vtd_domain_add_device(). Applications that want 147 to use the same device can do so by making SR-IOV copies. 148 149(4) Add syscall(s) to flush the cache/TLB entries if flushing is required. 150 151(5) Remove the devices from the domain when done. 152 153===================================================================================== 154 155ASSUMPTIONS AND CAVEATS 156 157The VT-d operations, such as constructing a domain, require passing the application's 158root PML4 VNode capability. As inter-core transfers of VNode capabilities have not 159yet been implemented, any application wishing to use the VT-d must be on the same 160core as the ACPI daemon. 161 162The calls for adding/removing devices require the user to specify the segment the 163device belongs to. Arrakis currently doesn't (and probably won't) support PCI 164segments, which is simply a logical collection of PCI buses. This is used for rather 165complex topologies or hierarchies, which may require more than 256 buses. Since this 166is the case, a value of 0 should be used for the segment number. 167 168Currently, the implementation assumes that there is a single remapping unit for each 169segment. Changes will need to be made to account for the possibility that more than 170one hardware unit resides on a single segment. 171 172Attemping to map the entire address space with the current implementation, which 173constructs the pagetable structure by creating and mapping frames, is infeasible. As 174a result, the identity page table only covers the first 1024 GB of physical memory, 175but the amount mapped can easily be changed. 176 177Since device drivers are started by Kaluga during PCI bus enumeration, translation 178has to be enabled before this. But the paths of devices under the scope of the 179remapping structures contained in the DMAR table require knowing secondary bus 180numbers for path lengths greater than 2. All of this information is reported to the 181SKB during bus enumeration. Currently, we avoid this problem by assuming that the 182path length of each device is 2. 183 184Coherency with the processes caches for pages and paging entries is determined by 185the Snoop Control and Page-walk Coherency fields in the Extended Capability register, 186respectively. If the Snoop Control bit is set to 1, then second-level page table 187entries with the SNP bit set will result in the remapping unit snooping the processor 188caches (for pages). Earlier, it was noted that x86-64 page table entries have been 189modified to contain a vtd_snoop field. The field that was changed for this to occur 190was an available field. This should not cause any problems with application mappings. 191Also, the vtd_snoop field is treated as reserved for hardware implementations having 192the Snoop Control bit set to 0. This should allow the VT-d to be used by hardware not 193supporting the snooping of CPU caches for pages. 194 195Hardware that don't have both of these bits set for relevant hardware units may 196require applications to flush the processor caches. Flushing may also be required if 197the addresses used by the devices in the domain of the application are not obtained 198from the provided user-mapping functions (e.g. vspace_map_one_frame_attr()). For 199performance benefits, syscalls to flush the cache should be avoided in the I/O 200path of the application. 201 202Note that there may be other problems/issues that we are yet unaware of. 203 204===================================================================================== 205 206TESTING 207 208The implementation has been only tested with applications using the e10k drivers, 209with appropriate changes being made to e10k_vf.c, e10k_cdriver.c, e10k_qdriver.c, and 210arranet.c (where we construct the domain and add the appropriate device(s)). 211 212These applications are: 213Memcached (using the UDP protocol) 214UDP echo server 215 216In arranet.c, the processor caches have to be flushed, when required, after 217initalizing the TX packet descriptors in lwip_arrakis_start(). 218 219===================================================================================== 220 221TODO 222 223Fault logging 224Interrupt remapping 225... 226 227===================================================================================== 228 229DOCUMENTATION 230 231Based on the latest VT-d architecture specification (September 2013): 232http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf) 233