1.. _memory_allocation:
2
3=======================
4Memory Allocation Guide
5=======================
6
7Linux provides a variety of APIs for memory allocation. You can
8allocate small chunks using `kmalloc` or `kmem_cache_alloc` families,
9large virtually contiguous areas using `vmalloc` and its derivatives,
10or you can directly request pages from the page allocator with
11`alloc_pages`. It is also possible to use more specialized allocators,
12for instance `cma_alloc` or `zs_malloc`.
13
14Most of the memory allocation APIs use GFP flags to express how that
15memory should be allocated. The GFP acronym stands for "get free
16pages", the underlying memory allocation function.
17
18Diversity of the allocation APIs combined with the numerous GFP flags
19makes the question "How should I allocate memory?" not that easy to
20answer, although very likely you should use
21
22::
23
24  kzalloc(<size>, GFP_KERNEL);
25
26Of course there are cases when other allocation APIs and different GFP
27flags must be used.
28
29Get Free Page flags
30===================
31
32The GFP flags control the allocators behavior. They tell what memory
33zones can be used, how hard the allocator should try to find free
34memory, whether the memory can be accessed by the userspace etc. The
35:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
36reference documentation for the GFP flags and their combinations and
37here we briefly outline their recommended usage:
38
39  * Most of the time ``GFP_KERNEL`` is what you need. Memory for the
40    kernel data structures, DMAable memory, inode cache, all these and
41    many other allocations types can use ``GFP_KERNEL``. Note, that
42    using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
43    direct reclaim may be triggered under memory pressure; the calling
44    context must be allowed to sleep.
45  * If the allocation is performed from an atomic context, e.g interrupt
46    handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and
47    IO or filesystem operations. Consequently, under memory pressure
48    ``GFP_NOWAIT`` allocation is likely to fail. Allocations which
49    have a reasonable fallback should be using ``GFP_NOWARN``.
50  * If you think that accessing memory reserves is justified and the kernel
51    will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.
52  * Untrusted allocations triggered from userspace should be a subject
53    of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
54    is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
55    allocations that should be accounted.
56  * Userspace allocations should use either of the ``GFP_USER``,
57    ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer
58    the flag name the less restrictive it is.
59
60    ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory
61    will be directly accessible by the kernel and implies that the
62    data is movable.
63
64    ``GFP_HIGHUSER`` means that the allocated memory is not movable,
65    but it is not required to be directly accessible by the kernel. An
66    example may be a hardware allocation that maps data directly into
67    userspace but has no addressing limitations.
68
69    ``GFP_USER`` means that the allocated memory is not movable and it
70    must be directly accessible by the kernel.
71
72You may notice that quite a few allocations in the existing code
73specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to
74prevent recursion deadlocks caused by direct memory reclaim calling
75back into the FS or IO paths and blocking on already held
76resources. Since 4.12 the preferred way to address this issue is to
77use new scope APIs described in
78:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
79
80Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
81used to ensure that the allocated memory is accessible by hardware
82with limited addressing capabilities. So unless you are writing a
83driver for a device with such restrictions, avoid using these flags.
84And even with hardware with restrictions it is preferable to use
85`dma_alloc*` APIs.
86
87GFP flags and reclaim behavior
88------------------------------
89Memory allocations may trigger direct or background reclaim and it is
90useful to understand how hard the page allocator will try to satisfy that
91or another request.
92
93  * ``GFP_KERNEL & ~__GFP_RECLAIM`` - optimistic allocation without _any_
94    attempt to free memory at all. The most light weight mode which even
95    doesn't kick the background reclaim. Should be used carefully because it
96    might deplete the memory and the next user might hit the more aggressive
97    reclaim.
98
99  * ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT``)- optimistic
100    allocation without any attempt to free memory from the current
101    context but can wake kswapd to reclaim memory if the zone is below
102    the low watermark. Can be used from either atomic contexts or when
103    the request is a performance optimization and there is another
104    fallback for a slow path.
105
106  * ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC``) -
107    non sleeping allocation with an expensive fallback so it can access
108    some portion of memory reserves. Usually used from interrupt/bottom-half
109    context with an expensive slow path fallback.
110
111  * ``GFP_KERNEL`` - both background and direct reclaim are allowed and the
112    **default** page allocator behavior is used. That means that not costly
113    allocation requests are basically no-fail but there is no guarantee of
114    that behavior so failures have to be checked properly by callers
115    (e.g. OOM killer victim is allowed to fail currently).
116
117  * ``GFP_KERNEL | __GFP_NORETRY`` - overrides the default allocator behavior
118    and all allocation requests fail early rather than cause disruptive
119    reclaim (one round of reclaim in this implementation). The OOM killer
120    is not invoked.
121
122  * ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - overrides the default allocator
123    behavior and all allocation requests try really hard. The request
124    will fail if the reclaim cannot make any progress. The OOM killer
125    won't be triggered.
126
127  * ``GFP_KERNEL | __GFP_NOFAIL`` - overrides the default allocator behavior
128    and all allocation requests will loop endlessly until they succeed.
129    This might be really dangerous especially for larger orders.
130
131Selecting memory allocator
132==========================
133
134The most straightforward way to allocate memory is to use a function
135from the kmalloc() family. And, to be on the safe side it's best to use
136routines that set memory to zero, like kzalloc(). If you need to
137allocate memory for an array, there are kmalloc_array() and kcalloc()
138helpers. The helpers struct_size(), array_size() and array3_size() can
139be used to safely calculate object sizes without overflowing.
140
141The maximal size of a chunk that can be allocated with `kmalloc` is
142limited. The actual limit depends on the hardware and the kernel
143configuration, but it is a good practice to use `kmalloc` for objects
144smaller than page size.
145
146The address of a chunk allocated with `kmalloc` is aligned to at least
147ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a power of two, the
148alignment is also guaranteed to be at least the respective size.
149
150Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
151to kmalloc_array(): a helper for resizing arrays is provided in the form of
152krealloc_array().
153
154For large allocations you can use vmalloc() and vzalloc(), or directly
155request pages from the page allocator. The memory allocated by `vmalloc`
156and related functions is not physically contiguous.
157
158If you are not sure whether the allocation size is too large for
159`kmalloc`, it is possible to use kvmalloc() and its derivatives. It will
160try to allocate memory with `kmalloc` and if the allocation fails it
161will be retried with `vmalloc`. There are restrictions on which GFP
162flags can be used with `kvmalloc`; please see kvmalloc_node() reference
163documentation. Note that `kvmalloc` may return memory that is not
164physically contiguous.
165
166If you need to allocate many identical objects you can use the slab
167cache allocator. The cache should be set up with kmem_cache_create() or
168kmem_cache_create_usercopy() before it can be used. The second function
169should be used if a part of the cache might be copied to the userspace.
170After the cache is created kmem_cache_alloc() and its convenience
171wrappers can allocate memory from that cache.
172
173When the allocated memory is no longer needed it must be freed.
174
175Objects allocated by `kmalloc` can be freed by `kfree` or `kvfree`. Objects
176allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`, `kfree`
177or `kvfree`, where the latter two might be more convenient thanks to not
178needing the kmem_cache pointer.
179
180The same rules apply to _bulk and _rcu flavors of freeing functions.
181
182Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
183Memory allocated by `kvmalloc` can be freed with `kvfree`.
184Caches created by `kmem_cache_create` should be freed with
185`kmem_cache_destroy` only after freeing all the allocated objects first.
186