1	         MEMORY ATTRIBUTE ALIASING ON IA-64
2
3			   Bjorn Helgaas
4		       <bjorn.helgaas@hp.com>
5			    May 4, 2006
6
7
8MEMORY ATTRIBUTES
9
10    Itanium supports several attributes for virtual memory references.
11    The attribute is part of the virtual translation, i.e., it is
12    contained in the TLB entry.  The ones of most interest to the Linux
13    kernel are:
14
15	WB		Write-back (cacheable)
16	UC		Uncacheable
17	WC		Write-coalescing
18
19    System memory typically uses the WB attribute.  The UC attribute is
20    used for memory-mapped I/O devices.  The WC attribute is uncacheable
21    like UC is, but writes may be delayed and combined to increase
22    performance for things like frame buffers.
23
24    The Itanium architecture requires that we avoid accessing the same
25    page with both a cacheable mapping and an uncacheable mapping[1].
26
27    The design of the chipset determines which attributes are supported
28    on which regions of the address space.  For example, some chipsets
29    support either WB or UC access to main memory, while others support
30    only WB access.
31
32MEMORY MAP
33
34    Platform firmware describes the physical memory map and the
35    supported attributes for each region.  At boot-time, the kernel uses
36    the EFI GetMemoryMap() interface.  ACPI can also describe memory
37    devices and the attributes they support, but Linux/ia64 currently
38    doesn't use this information.
39
40    The kernel uses the efi_memmap table returned from GetMemoryMap() to
41    learn the attributes supported by each region of physical address
42    space.  Unfortunately, this table does not completely describe the
43    address space because some machines omit some or all of the MMIO
44    regions from the map.
45
46    The kernel maintains another table, kern_memmap, which describes the
47    memory Linux is actually using and the attribute for each region.
48    This contains only system memory; it does not contain MMIO space.
49
50    The kern_memmap table typically contains only a subset of the system
51    memory described by the efi_memmap.  Linux/ia64 can't use all memory
52    in the system because of constraints imposed by the identity mapping
53    scheme.
54
55    The efi_memmap table is preserved unmodified because the original
56    boot-time information is required for kexec.
57
58KERNEL IDENTITY MAPPINGS
59
60    Linux/ia64 identity mappings are done with large pages, currently
61    either 16MB or 64MB, referred to as "granules."  Cacheable mappings
62    are speculative[2], so the processor can read any location in the
63    page at any time, independent of the programmer's intentions.  This
64    means that to avoid attribute aliasing, Linux can create a cacheable
65    identity mapping only when the entire granule supports cacheable
66    access.
67
68    Therefore, kern_memmap contains only full granule-sized regions that
69    can referenced safely by an identity mapping.
70
71    Uncacheable mappings are not speculative, so the processor will
72    generate UC accesses only to locations explicitly referenced by
73    software.  This allows UC identity mappings to cover granules that
74    are only partially populated, or populated with a combination of UC
75    and WB regions.
76
77USER MAPPINGS
78
79    User mappings are typically done with 16K or 64K pages.  The smaller
80    page size allows more flexibility because only 16K or 64K has to be
81    homogeneous with respect to memory attributes.
82
83POTENTIAL ATTRIBUTE ALIASING CASES
84
85    There are several ways the kernel creates new mappings:
86
87    mmap of /dev/mem
88
89	This uses remap_pfn_range(), which creates user mappings.  These
90	mappings may be either WB or UC.  If the region being mapped
91	happens to be in kern_memmap, meaning that it may also be mapped
92	by a kernel identity mapping, the user mapping must use the same
93	attribute as the kernel mapping.
94
95	If the region is not in kern_memmap, the user mapping should use
96	an attribute reported as being supported in the EFI memory map.
97
98	Since the EFI memory map does not describe MMIO on some
99	machines, this should use an uncacheable mapping as a fallback.
100
101    mmap of /sys/class/pci_bus/.../legacy_mem
102
103	This is very similar to mmap of /dev/mem, except that legacy_mem
104	only allows mmap of the one megabyte "legacy MMIO" area for a
105	specific PCI bus.  Typically this is the first megabyte of
106	physical address space, but it may be different on machines with
107	several VGA devices.
108
109	"X" uses this to access VGA frame buffers.  Using legacy_mem
110	rather than /dev/mem allows multiple instances of X to talk to
111	different VGA cards.
112
113	The /dev/mem mmap constraints apply.
114
115    read/write of /dev/mem
116
117	This uses copy_from_user(), which implicitly uses a kernel
118	identity mapping.  This is obviously safe for things in
119	kern_memmap.
120
121	There may be corner cases of things that are not in kern_memmap,
122	but could be accessed this way.  For example, registers in MMIO
123	space are not in kern_memmap, but could be accessed with a UC
124	mapping.  This would not cause attribute aliasing.  But
125	registers typically can be accessed only with four-byte or
126	eight-byte accesses, and the copy_from_user() path doesn't allow
127	any control over the access size, so this would be dangerous.
128
129    ioremap()
130
131	This returns a mapping for use inside the kernel.
132
133	If the region is in kern_memmap, we should use the attribute
134	specified there.
135
136	If the EFI memory map reports that the entire granule supports
137	WB, we should use that (granules that are partially reserved
138	or occupied by firmware do not appear in kern_memmap).
139
140	If the granule contains non-WB memory, but we can cover the
141	region safely with kernel page table mappings, we can use
142	ioremap_page_range() as most other architectures do.
143
144	Failing all of the above, we have to fall back to a UC mapping.
145
146PAST PROBLEM CASES
147
148    mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
149
150      The EFI memory map may not report these MMIO regions.
151
152      These must be allowed so that X will work.  This means that
153      when the EFI memory map is incomplete, every /dev/mem mmap must
154      succeed.  It may create either WB or UC user mappings, depending
155      on whether the region is in kern_memmap or the EFI memory map.
156
157    mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
158
159      See https://bugzilla.novell.com/show_bug.cgi?id=140858.
160
161      The EFI memory map reports the following attributes:
162        0x00000-0x9FFFF WB only
163        0xA0000-0xBFFFF UC only (VGA frame buffer)
164        0xC0000-0xFFFFF WB only
165
166      This mmap is done with user pages, not kernel identity mappings,
167      so it is safe to use WB mappings.
168
169      The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
170      which uses a granule-sized UC mapping.  This granule will cover some
171      WB-only memory, but since UC is non-speculative, the processor will
172      never generate an uncacheable reference to the WB-only areas unless
173      the driver explicitly touches them.
174
175    mmap of 0x0-0xFFFFF legacy_mem by "X"
176
177      If the EFI memory map reports that the entire range supports the
178      same attributes, we can allow the mmap (and we will prefer WB if
179      supported, as is the case with HP sx[12]000 machines with VGA
180      disabled).
181
182      If EFI reports the range as partly WB and partly UC (as on sx[12]000
183      machines with VGA enabled), we must fail the mmap because there's no
184      safe attribute to use.
185
186      If EFI reports some of the range but not all (as on Intel firmware
187      that doesn't report the VGA frame buffer at all), we should fail the
188      mmap and force the user to map just the specific region of interest.
189
190    mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
191
192      The EFI memory map reports the following attributes:
193        0x00000-0xFFFFF WB only (no VGA MMIO hole)
194
195      This is a special case of the previous case, and the mmap should
196      fail for the same reason as above.
197
198    read of /sys/devices/.../rom
199
200      For VGA devices, this may cause an ioremap() of 0xC0000.  This
201      used to be done with a UC mapping, because the VGA frame buffer
202      at 0xA0000 prevents use of a WB granule.  The UC mapping causes
203      an MCA on HP sx[12]000 chipsets.
204
205      We should use WB page table mappings to avoid covering the VGA
206      frame buffer.
207
208NOTES
209
210    [1] SDM rev 2.2, vol 2, sec 4.4.1.
211    [2] SDM rev 2.2, vol 2, sec 4.4.6.
212