1 MEMORY ATTRIBUTE ALIASING ON IA-64 2 3 Bjorn Helgaas 4 <bjorn.helgaas@hp.com> 5 May 4, 2006 6 7 8MEMORY ATTRIBUTES 9 10 Itanium supports several attributes for virtual memory references. 11 The attribute is part of the virtual translation, i.e., it is 12 contained in the TLB entry. The ones of most interest to the Linux 13 kernel are: 14 15 WB Write-back (cacheable) 16 UC Uncacheable 17 WC Write-coalescing 18 19 System memory typically uses the WB attribute. The UC attribute is 20 used for memory-mapped I/O devices. The WC attribute is uncacheable 21 like UC is, but writes may be delayed and combined to increase 22 performance for things like frame buffers. 23 24 The Itanium architecture requires that we avoid accessing the same 25 page with both a cacheable mapping and an uncacheable mapping[1]. 26 27 The design of the chipset determines which attributes are supported 28 on which regions of the address space. For example, some chipsets 29 support either WB or UC access to main memory, while others support 30 only WB access. 31 32MEMORY MAP 33 34 Platform firmware describes the physical memory map and the 35 supported attributes for each region. At boot-time, the kernel uses 36 the EFI GetMemoryMap() interface. ACPI can also describe memory 37 devices and the attributes they support, but Linux/ia64 currently 38 doesn't use this information. 39 40 The kernel uses the efi_memmap table returned from GetMemoryMap() to 41 learn the attributes supported by each region of physical address 42 space. Unfortunately, this table does not completely describe the 43 address space because some machines omit some or all of the MMIO 44 regions from the map. 45 46 The kernel maintains another table, kern_memmap, which describes the 47 memory Linux is actually using and the attribute for each region. 48 This contains only system memory; it does not contain MMIO space. 49 50 The kern_memmap table typically contains only a subset of the system 51 memory described by the efi_memmap. Linux/ia64 can't use all memory 52 in the system because of constraints imposed by the identity mapping 53 scheme. 54 55 The efi_memmap table is preserved unmodified because the original 56 boot-time information is required for kexec. 57 58KERNEL IDENTITY MAPPINGS 59 60 Linux/ia64 identity mappings are done with large pages, currently 61 either 16MB or 64MB, referred to as "granules." Cacheable mappings 62 are speculative[2], so the processor can read any location in the 63 page at any time, independent of the programmer's intentions. This 64 means that to avoid attribute aliasing, Linux can create a cacheable 65 identity mapping only when the entire granule supports cacheable 66 access. 67 68 Therefore, kern_memmap contains only full granule-sized regions that 69 can referenced safely by an identity mapping. 70 71 Uncacheable mappings are not speculative, so the processor will 72 generate UC accesses only to locations explicitly referenced by 73 software. This allows UC identity mappings to cover granules that 74 are only partially populated, or populated with a combination of UC 75 and WB regions. 76 77USER MAPPINGS 78 79 User mappings are typically done with 16K or 64K pages. The smaller 80 page size allows more flexibility because only 16K or 64K has to be 81 homogeneous with respect to memory attributes. 82 83POTENTIAL ATTRIBUTE ALIASING CASES 84 85 There are several ways the kernel creates new mappings: 86 87 mmap of /dev/mem 88 89 This uses remap_pfn_range(), which creates user mappings. These 90 mappings may be either WB or UC. If the region being mapped 91 happens to be in kern_memmap, meaning that it may also be mapped 92 by a kernel identity mapping, the user mapping must use the same 93 attribute as the kernel mapping. 94 95 If the region is not in kern_memmap, the user mapping should use 96 an attribute reported as being supported in the EFI memory map. 97 98 Since the EFI memory map does not describe MMIO on some 99 machines, this should use an uncacheable mapping as a fallback. 100 101 mmap of /sys/class/pci_bus/.../legacy_mem 102 103 This is very similar to mmap of /dev/mem, except that legacy_mem 104 only allows mmap of the one megabyte "legacy MMIO" area for a 105 specific PCI bus. Typically this is the first megabyte of 106 physical address space, but it may be different on machines with 107 several VGA devices. 108 109 "X" uses this to access VGA frame buffers. Using legacy_mem 110 rather than /dev/mem allows multiple instances of X to talk to 111 different VGA cards. 112 113 The /dev/mem mmap constraints apply. 114 115 read/write of /dev/mem 116 117 This uses copy_from_user(), which implicitly uses a kernel 118 identity mapping. This is obviously safe for things in 119 kern_memmap. 120 121 There may be corner cases of things that are not in kern_memmap, 122 but could be accessed this way. For example, registers in MMIO 123 space are not in kern_memmap, but could be accessed with a UC 124 mapping. This would not cause attribute aliasing. But 125 registers typically can be accessed only with four-byte or 126 eight-byte accesses, and the copy_from_user() path doesn't allow 127 any control over the access size, so this would be dangerous. 128 129 ioremap() 130 131 This returns a mapping for use inside the kernel. 132 133 If the region is in kern_memmap, we should use the attribute 134 specified there. 135 136 If the EFI memory map reports that the entire granule supports 137 WB, we should use that (granules that are partially reserved 138 or occupied by firmware do not appear in kern_memmap). 139 140 If the granule contains non-WB memory, but we can cover the 141 region safely with kernel page table mappings, we can use 142 ioremap_page_range() as most other architectures do. 143 144 Failing all of the above, we have to fall back to a UC mapping. 145 146PAST PROBLEM CASES 147 148 mmap of various MMIO regions from /dev/mem by "X" on Intel platforms 149 150 The EFI memory map may not report these MMIO regions. 151 152 These must be allowed so that X will work. This means that 153 when the EFI memory map is incomplete, every /dev/mem mmap must 154 succeed. It may create either WB or UC user mappings, depending 155 on whether the region is in kern_memmap or the EFI memory map. 156 157 mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled 158 159 See https://bugzilla.novell.com/show_bug.cgi?id=140858. 160 161 The EFI memory map reports the following attributes: 162 0x00000-0x9FFFF WB only 163 0xA0000-0xBFFFF UC only (VGA frame buffer) 164 0xC0000-0xFFFFF WB only 165 166 This mmap is done with user pages, not kernel identity mappings, 167 so it is safe to use WB mappings. 168 169 The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, 170 which uses a granule-sized UC mapping. This granule will cover some 171 WB-only memory, but since UC is non-speculative, the processor will 172 never generate an uncacheable reference to the WB-only areas unless 173 the driver explicitly touches them. 174 175 mmap of 0x0-0xFFFFF legacy_mem by "X" 176 177 If the EFI memory map reports that the entire range supports the 178 same attributes, we can allow the mmap (and we will prefer WB if 179 supported, as is the case with HP sx[12]000 machines with VGA 180 disabled). 181 182 If EFI reports the range as partly WB and partly UC (as on sx[12]000 183 machines with VGA enabled), we must fail the mmap because there's no 184 safe attribute to use. 185 186 If EFI reports some of the range but not all (as on Intel firmware 187 that doesn't report the VGA frame buffer at all), we should fail the 188 mmap and force the user to map just the specific region of interest. 189 190 mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled 191 192 The EFI memory map reports the following attributes: 193 0x00000-0xFFFFF WB only (no VGA MMIO hole) 194 195 This is a special case of the previous case, and the mmap should 196 fail for the same reason as above. 197 198 read of /sys/devices/.../rom 199 200 For VGA devices, this may cause an ioremap() of 0xC0000. This 201 used to be done with a UC mapping, because the VGA frame buffer 202 at 0xA0000 prevents use of a WB granule. The UC mapping causes 203 an MCA on HP sx[12]000 chipsets. 204 205 We should use WB page table mappings to avoid covering the VGA 206 frame buffer. 207 208NOTES 209 210 [1] SDM rev 2.2, vol 2, sec 4.4.1. 211 [2] SDM rev 2.2, vol 2, sec 4.4.6. 212