Searched +hist:8 +hist:eb8284b (Results 1 - 4 of 4) sorted by relevance
/linux-master/mm/ | ||
H A D | slab.h | diff 3c615294 Fri Jul 14 00:44:22 MDT 2023 GONG, Ruiqi <gongruiqi@huaweicloud.com> Randomized slab caches for kmalloc() When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 0c474d31 Mon Jun 12 09:31:47 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: simplify create_kmalloc_cache() args and make it static In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them. Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff b539ce9f Fri Oct 21 13:18:12 MDT 2022 Jiri Kosina <jkosina@suse.cz> mm/slab: Annotate kmem_cache_node->list_lock as raw The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff b539ce9f Fri Oct 21 13:18:12 MDT 2022 Jiri Kosina <jkosina@suse.cz> mm/slab: Annotate kmem_cache_node->list_lock as raw The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff b539ce9f Fri Oct 21 13:18:12 MDT 2022 Jiri Kosina <jkosina@suse.cz> mm/slab: Annotate kmem_cache_node->list_lock as raw The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT. That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds. Annotate it therefore as a raw lock in order to get rid of he lockdep warning below. ============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 8dfa9d55 Wed Aug 17 04:18:25 MDT 2022 Hyeonggon Yoo <42.hyeyoo@gmail.com> mm/slab_common: move declaration of __ksize() to mm/slab.h __ksize() is only called by KASAN. Remove export symbol and move declaration to mm/slab.h as we don't want to grow its callers. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff fdbcb2a6 Mon Jun 28 20:37:19 MDT 2021 Waiman Long <longman@redhat.com> mm/memcg: move mod_objcg_state() to memcontrol.c Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6. With the recent introduction of the new slab memory controller, we eliminate the need for having separate kmemcaches for each memory cgroup and reduce overall kernel memory usage. However, we also add additional memory accounting overhead to each call of kmem_cache_alloc() and kmem_cache_free(). For workloads that require a lot of kmemcache allocations and de-allocations, they may experience performance regression as illustrated in [1] and [2]. A simple kernel module that performs repeated loop of 100,000,000 kmem_cache_alloc() and kmem_cache_free() of either a small 32-byte object or a big 4k object at module init time with a batch size of 4 (4 kmalloc's followed by 4 kfree's) is used for benchmarking. The benchmarking tool was run on a kernel based on linux-next-20210419. The test was run on a CascadeLake server with turbo-boosting disable to reduce run-to-run variation. The small object test exercises mainly the object stock charging and vmstat update code paths. The large object test also exercises the refill_obj_stock() and __memcg_kmem_charge()/__memcg_kmem_uncharge() code paths. With memory accounting disabled, the run time was 3.130s with both small object big object tests. With memory accounting enabled, both cgroup v1 and v2 showed similar results in the small object test. The performance results of the large object test, however, differed between cgroup v1 and v2. The execution times with the application of various patches in the patchset were: Applied patches Run time Accounting overhead %age 1 %age 2 --------------- -------- ------------------- ------ ------ Small 32-byte object: None 11.634s 8.504s 100.0% 271.7% 1-2 9.425s 6.295s 74.0% 201.1% 1-3 9.708s 6.578s 77.4% 210.2% 1-4 8.062s 4.932s 58.0% 157.6% Large 4k object (v2): None 22.107s 18.977s 100.0% 606.3% 1-2 20.960s 17.830s 94.0% 569.6% 1-3 14.238s 11.108s 58.5% 354.9% 1-4 11.329s 8.199s 43.2% 261.9% Large 4k object (v1): None 36.807s 33.677s 100.0% 1075.9% 1-2 36.648s 33.518s 99.5% 1070.9% 1-3 22.345s 19.215s 57.1% 613.9% 1-4 18.662s 15.532s 46.1% 496.2% N.B. %age 1 = overhead/unpatched overhead %age 2 = overhead/accounting disabled time Patch 2 (vmstat data stock caching) helps in both the small object test and the large v2 object test. It doesn't help much in v1 big object test. Patch 3 (refill_obj_stock improvement) does help the small object test but offer significant performance improvement for the large object test (both v1 and v2). Patch 4 (eliminating irq disable/enable) helps in all test cases. To test for the extreme case, a multi-threaded kmalloc/kfree microbenchmark was run on the 2-socket 48-core 96-thread system with 96 testing threads in the same memcg doing kmalloc+kfree of a 4k object with accounting enabled for 10s. The total number of kmalloc+kfree done in kilo operations per second (kops/s) were as follows: Applied patches v1 kops/s v1 change v2 kops/s v2 change --------------- --------- --------- --------- --------- None 3,520 1.00X 6,242 1.00X 1-2 4,304 1.22X 8,478 1.36X 1-3 4,731 1.34X 418,142 66.99X 1-4 4,587 1.30X 438,838 70.30X With memory accounting disabled, the kmalloc/kfree rate was 1,481,291 kop/s. This test shows how significant the memory accouting overhead can be in some extreme situations. For this multithreaded test, the improvement from patch 2 mainly comes from the conditional atomic xchg of objcg->nr_charged_bytes in mod_objcg_state(). By using an unconditional xchg, the operation rates were similar to the unpatched kernel. Patch 3 elminates the single highly contended cacheline of objcg->nr_charged_bytes for cgroup v2 leading to a huge performance improvement. Cgroup v1, however, still has another highly contended cacheline in the shared page counter &memcg->kmem. So the improvement is only modest. Patch 4 helps in cgroup v2, but performs worse in cgroup v1 as eliminating the irq_disable/irq_enable overhead seems to aggravate the cacheline contention. [1] https://lore.kernel.org/linux-mm/20210408193948.vfktg3azh2wrt56t@gabell/T/#u [2] https://lore.kernel.org/lkml/20210114025151.GA22932@xsang-OptiPlex-9020/ This patch (of 4): mod_objcg_state() is moved from mm/slab.h to mm/memcontrol.c so that further optimization can be done to it in later patches without exposing unnecessary details to other mm components. Link: https://lkml.kernel.org/r/20210506150007.16288-1-longman@redhat.com Link: https://lkml.kernel.org/r/20210506150007.16288-2-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Chris Down <chris@chrisdown.name> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 8e7f37f2 Mon Dec 07 18:41:02 MST 2020 Paul E. McKenney <paulmck@kernel.org> mm: Add mem_dump_obj() to print source of memory block There are kernel facilities such as per-CPU reference counts that give error messages in generic handlers or callbacks, whose messages are unenlightening. In the case of per-CPU reference-count underflow, this is not a problem when creating a new use of this facility because in that case the bug is almost certainly in the code implementing that new use. However, trouble arises when deploying across many systems, which might exercise corner cases that were not seen during development and testing. Here, it would be really nice to get some kind of hint as to which of several uses the underflow was caused by. This commit therefore exposes a mem_dump_obj() function that takes a pointer to memory (which must still be allocated if it has been dynamically allocated) and prints available information on where that memory came from. This pointer can reference the middle of the block as well as the beginning of the block, as needed by things like RCU callback functions and timer handlers that might not know where the beginning of the memory block is. These functions and handlers can use mem_dump_obj() to print out better hints as to where the problem might lie. The information printed can depend on kernel configuration. For example, the allocation return address can be printed only for slab and slub, and even then only when the necessary debug has been enabled. For slab, build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space to the next power of two or use the SLAB_STORE_USER when creating the kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create() if more focused use is desired. Also for slub, use CONFIG_STACKTRACE to enable printing of the allocation-time stack trace. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reported-by: Andrii Nakryiko <andrii@kernel.org> [ paulmck: Convert to printing and change names per Joonsoo Kim. ] [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ] [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ] [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ] [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ] [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ] Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff 286e04b8 Fri Aug 07 00:20:52 MDT 2020 Roman Gushchin <guro@fb.com> mm: memcg/slab: allocate obj_cgroups for non-root slab pages Allocate and release memory to store obj_cgroup pointers for each non-root slab page. Reuse page->mem_cgroup pointer to store a pointer to the allocated space. This commit temporarily increases the memory footprint of the kernel memory accounting. To store obj_cgroup pointers we'll need a place for an objcg_pointer for each allocated object. However, the following patches in the series will enable sharing of slab pages between memory cgroups, which will dramatically increase the total slab utilization. And the final memory footprint will be significantly smaller than before. To distinguish between obj_cgroups and memcg pointers in case when it's not obvious which one is used (as in page_cgroup_ino()), let's always set the lowest bit in the obj_cgroup case. The original obj_cgroups pointer is marked to be ignored by kmemleak, which otherwise would report a memory leak for each allocated vector. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-8-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6d6ea1e9 Thu Mar 28 21:43:42 MDT 2019 Nicolas Boichat <drinkcat@chromium.org> mm: add support for kmem caches in DMA32 zone Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables", v6. This is a followup to the discussion in [1], [2]. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For L1 tables that are bigger than a page, we can just use __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA). For L2 tables that only take 1KB, it would be a waste to allocate a full page, so we considered 3 approaches: 1. This series, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. [3] This series is the most memory-efficient approach. stable@ note: We confirmed that this is a regression, and IOMMU errors happen on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek platforms (and maybe others?). [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html [3] https://patchwork.codeaurora.org/patch/671639/ This patch (of 3): IOMMUs using ARMv7 short-descriptor format require page tables to be allocated within the first 4GB of RAM, even on 64-bit systems. On arm64, this is done by passing GFP_DMA32 flag to memory allocation functions. For IOMMU L2 tables that only take 1KB, it would be a waste to allocate a full page using get_free_pages, so we considered 3 approaches: 1. This patch, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. This change makes it possible to create a custom cache in DMA32 zone using kmem_cache_create, then allocate memory using kmem_cache_alloc. We do not create a DMA32 kmalloc cache array, as there are currently no users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK. This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32 kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and unnecessary). Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
H A D | slab_common.c | diff c15cdea5 Fri Oct 06 10:39:34 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") allows architectures with non-coherent DMA to define a small ARCH_KMALLOC_MINALIGN (e.g. sizeof(unsigned long long)) and this has been enabled on arm64. With KASAN_HW_TAGS enabled, however, ARCH_SLAB_MINALIGN becomes 16 on arm64 (arch_slab_minalign() dynamically selects it since commit d949a8155d13 ("mm: make minimum slab alignment a runtime property")). This can lead to a situation where kmalloc-8 caches are attempted to be created with a kmem_caches.size aligned to 16. When the cache is mergeable, it can lead to kernel warnings like: sysfs: cannot create duplicate filename '/kernel/slab/:d-0000016' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00001-gda98843cd306-dirty #5 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x90/0xe8 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 sysfs_warn_dup+0x64/0x80 sysfs_create_dir_ns+0xe8/0x108 kobject_add_internal+0x98/0x264 kobject_init_and_add+0x8c/0xd8 sysfs_slab_add+0x12c/0x248 slab_sysfs_init+0x98/0x14c do_one_initcall+0x6c/0x1b0 kernel_init_freeable+0x1c0/0x288 kernel_init+0x24/0x1e0 ret_from_fork+0x10/0x20 kobject: kobject_add_internal failed for :d-0000016 with -EEXIST, don't try to register things with the same name in the same directory. SLUB: Unable to add boot slab dma-kmalloc-8 to sysfs Limit the __kmalloc_minalign() return value (used to create the kmalloc-* caches) to arch_slab_minalign() so that kmalloc-8 caches are skipped when KASAN_HW_TAGS is enabled (both config and runtime). Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff c15cdea5 Fri Oct 06 10:39:34 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") allows architectures with non-coherent DMA to define a small ARCH_KMALLOC_MINALIGN (e.g. sizeof(unsigned long long)) and this has been enabled on arm64. With KASAN_HW_TAGS enabled, however, ARCH_SLAB_MINALIGN becomes 16 on arm64 (arch_slab_minalign() dynamically selects it since commit d949a8155d13 ("mm: make minimum slab alignment a runtime property")). This can lead to a situation where kmalloc-8 caches are attempted to be created with a kmem_caches.size aligned to 16. When the cache is mergeable, it can lead to kernel warnings like: sysfs: cannot create duplicate filename '/kernel/slab/:d-0000016' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00001-gda98843cd306-dirty #5 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x90/0xe8 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 sysfs_warn_dup+0x64/0x80 sysfs_create_dir_ns+0xe8/0x108 kobject_add_internal+0x98/0x264 kobject_init_and_add+0x8c/0xd8 sysfs_slab_add+0x12c/0x248 slab_sysfs_init+0x98/0x14c do_one_initcall+0x6c/0x1b0 kernel_init_freeable+0x1c0/0x288 kernel_init+0x24/0x1e0 ret_from_fork+0x10/0x20 kobject: kobject_add_internal failed for :d-0000016 with -EEXIST, don't try to register things with the same name in the same directory. SLUB: Unable to add boot slab dma-kmalloc-8 to sysfs Limit the __kmalloc_minalign() return value (used to create the kmalloc-* caches) to arch_slab_minalign() so that kmalloc-8 caches are skipped when KASAN_HW_TAGS is enabled (both config and runtime). Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff c15cdea5 Fri Oct 06 10:39:34 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") allows architectures with non-coherent DMA to define a small ARCH_KMALLOC_MINALIGN (e.g. sizeof(unsigned long long)) and this has been enabled on arm64. With KASAN_HW_TAGS enabled, however, ARCH_SLAB_MINALIGN becomes 16 on arm64 (arch_slab_minalign() dynamically selects it since commit d949a8155d13 ("mm: make minimum slab alignment a runtime property")). This can lead to a situation where kmalloc-8 caches are attempted to be created with a kmem_caches.size aligned to 16. When the cache is mergeable, it can lead to kernel warnings like: sysfs: cannot create duplicate filename '/kernel/slab/:d-0000016' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00001-gda98843cd306-dirty #5 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x90/0xe8 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 sysfs_warn_dup+0x64/0x80 sysfs_create_dir_ns+0xe8/0x108 kobject_add_internal+0x98/0x264 kobject_init_and_add+0x8c/0xd8 sysfs_slab_add+0x12c/0x248 slab_sysfs_init+0x98/0x14c do_one_initcall+0x6c/0x1b0 kernel_init_freeable+0x1c0/0x288 kernel_init+0x24/0x1e0 ret_from_fork+0x10/0x20 kobject: kobject_add_internal failed for :d-0000016 with -EEXIST, don't try to register things with the same name in the same directory. SLUB: Unable to add boot slab dma-kmalloc-8 to sysfs Limit the __kmalloc_minalign() return value (used to create the kmalloc-* caches) to arch_slab_minalign() so that kmalloc-8 caches are skipped when KASAN_HW_TAGS is enabled (both config and runtime). Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 3c615294 Fri Jul 14 00:44:22 MDT 2023 GONG, Ruiqi <gongruiqi@huaweicloud.com> Randomized slab caches for kmalloc() When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff b035f5a6 Mon Jun 12 09:32:00 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc() cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN. Link: https://lkml.kernel.org/r/20230612153201.554742-17-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 963e84b0 Mon Jun 12 09:31:48 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: limit kmalloc() minimum alignment to dma_get_cache_alignment() Do not create kmalloc() caches which are not aligned to dma_get_cache_alignment(). There is no functional change since for current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On architectures without a specific ARCH_DMA_MINALIGN, dma_get_cache_alignment() is 1, so no change to the kmalloc() caches. Link: https://lkml.kernel.org/r/20230612153201.554742-5-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 0c474d31 Mon Jun 12 09:31:47 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: simplify create_kmalloc_cache() args and make it static In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them. Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 9ed9cac1 Fri Sep 23 14:28:07 MDT 2022 Kees Cook <keescook@chromium.org> slab: Remove __malloc attribute from realloc functions The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed. Thanks to Geert Uytterhoeven <geert@linux-m68k.org> for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 6edf2576 Tue Sep 13 00:54:20 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 6edf2576 Tue Sep 13 00:54:20 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
H A D | slub.c | diff c94d2224 Tue Feb 20 17:12:54 MST 2024 Chengming Zhou <chengming.zhou@linux.dev> mm, slab: fix the comment of cpu partial list The partial slabs on cpu partial list are not frozen after the commit 8cd3fa428b56 ("slub: Delay freezing of partial slabs") merged. So fix the comment. Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 3450a0e5 Mon Nov 13 10:04:05 MST 2023 Vlastimil Babka <vbabka@suse.cz> mm/slub: optimize alloc fastpath code layout With allocation fastpaths no longer divided between two .c files, we have better inlining, however checking the disassembly of kmem_cache_alloc() reveals we can do better to make the fastpaths smaller and move the less common situations out of line or to separate functions, to reduce instruction cache pressure. - split memcg pre/post alloc hooks to inlined checks that use likely() to assume there will be no objcg handling necessary, and non-inline functions doing the actual handling - add some more likely/unlikely() to pre/post alloc hooks to indicate which scenarios should be out of line - change gfp_allowed_mask handling in slab_post_alloc_hook() so the code can be optimized away when kasan/kmsan/kmemleak is configured out bloat-o-meter shows: add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403) Function old new delta __memcg_slab_post_alloc_hook - 461 +461 kmem_cache_alloc_bulk 775 791 +16 __pfx_should_failslab.constprop - 16 +16 __pfx___memcg_slab_post_alloc_hook - 16 +16 should_failslab.constprop - 12 +12 __pfx_memcg_slab_post_alloc_hook 16 - -16 kmem_cache_alloc_lru 1295 1023 -272 kmem_cache_alloc_node 1118 817 -301 kmem_cache_alloc 1076 772 -304 kmalloc_node_trace 1149 838 -311 kmalloc_trace 1102 789 -313 __kmalloc_node_track_caller 1393 1080 -313 __kmalloc_node 1397 1082 -315 __kmalloc 1374 1059 -315 memcg_slab_post_alloc_hook 464 - -464 Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the code is out of line. Forcing noinline did not improve the results. As a result the fastpaths are shorter and overal code size is reduced. Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 8cd3fa42 Wed Nov 01 21:23:27 MDT 2023 Chengming Zhou <zhouchengming@bytedance.com> slub: Delay freezing of partial slabs Now we will freeze slabs when moving them out of node partial list to cpu partial list, this method needs two cmpxchg_double operations: 1. freeze slab (acquire_slab()) under the node list_lock 2. get_freelist() when pick used in ___slab_alloc() Actually we don't need to freeze when moving slabs out of node partial list, we can delay freezing to when use slab freelist in ___slab_alloc(), so we can save one cmpxchg_double(). And there are other good points: - The moving of slabs between node partial list and cpu partial list becomes simpler, since we don't need to freeze or unfreeze at all. - The node list_lock contention would be less, since we don't need to freeze any slab under the node list_lock. We can achieve this because there is no concurrent path would manipulate the partial slab list except the __slab_free() path, which is now serialized by slab_test_node_partial() under the list_lock. Since the slab returned by get_partial() interfaces is not frozen anymore and no freelist is returned in the partial_context, so we need to use the introduced freeze_slab() to freeze it and get its freelist. Similarly, the slabs on the CPU partial list are not frozen anymore, we need to freeze_slab() on it before use. We can now delete acquire_slab() as it became unused. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 8a399e2f Wed Nov 01 21:23:24 MDT 2023 Chengming Zhou <zhouchengming@bytedance.com> slub: Keep track of whether slub is on the per-node partial list Now we rely on the "frozen" bit to see if we should manipulate the slab->slab_list, which will be changed in the following patch. Instead we introduce another way to keep track of whether slub is on the per-node partial list, here we reuse the PG_workingset bit. We have to use the atomic set_bit() and clear_bit() variants and change slab_unlock() to bit_spin_unlock() because when cmpxchg is not available and PG_lock is used, there may be concurrent operations on the two bits. Thanks to Mark Brown for reporting a hang and testing of a previous version where the non-atomic operations were used. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff be784ba8 Mon Nov 21 08:58:39 MST 2022 Vlastimil Babka <vbabka@suse.cz> mm, slub: don't aggressively inline with CONFIG_SLUB_TINY SLUB fastpaths use __always_inline to avoid function calls. With CONFIG_SLUB_TINY we would rather save the memory. Add a __fastpath_inline macro that's __always_inline normally but empty with CONFIG_SLUB_TINY. bloat-o-meter results on x86_64 mm/slub.o: add/remove: 3/1 grow/shrink: 1/8 up/down: 865/-1784 (-919) Function old new delta kmem_cache_free 20 281 +261 slab_alloc_node.isra - 245 +245 slab_free.constprop.isra - 231 +231 __kmem_cache_alloc_lru.isra - 128 +128 __kmem_cache_release 88 83 -5 __kmem_cache_create 1446 1436 -10 __kmem_cache_free 271 142 -129 kmem_cache_alloc_node 330 127 -203 kmem_cache_free_bulk.part 826 613 -213 __kmem_cache_alloc_node 230 10 -220 kmem_cache_alloc_lru 325 12 -313 kmem_cache_alloc 325 10 -315 kmem_cache_free.part 376 - -376 Total: Before=26103, After=25184, chg -3.52% Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> diff 90ce872c Mon Nov 21 03:44:54 MST 2022 Vlastimil Babka <vbabka@suse.cz> mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY With CONFIG_SLUB_TINY we want to minimize memory overhead. By lowering the default slub_max_order we can make slab allocations use smaller pages. However depending on object sizes, order-0 might not be the best due to increased fragmentation. When testing on a 8MB RAM k210 system by Damien Le Moal [1], slub_max_order=1 had the best results, so use that as the default for CONFIG_SLUB_TINY. [1] https://lore.kernel.org/all/6a1883c4-4c3f-545a-90e8-2cd805bcf4ae@opensource.wdc.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> diff 8b881763 Fri Nov 04 08:57:26 MDT 2022 Vlastimil Babka <vbabka@suse.cz> mm/migrate: make isolate_movable_page() skip slab pages In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1]. To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page(). Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock. [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/ Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> diff 81895a65 Wed Oct 05 08:43:38 MDT 2022 Jason A. Donenfeld <Jason@zx2c4.com> treewide: use prandom_u32_max() when possible, part 1 Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> diff b731e357 Fri Sep 30 04:07:30 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: fix a slab missed to be freed problem When enable kasan and kfence's in-kernel kunit test with slub_debug on, it caught a problem (in linux-next tree): ------------[ cut here ]------------ kmem_cache_destroy test: Slab cache still has objects when called from test_exit+0x1a/0x30 WARNING: CPU: 3 PID: 240 at mm/slab_common.c:492 kmem_cache_destroy+0x16c/0x170 Modules linked in: CPU: 3 PID: 240 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc7-next-20220929 #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kmem_cache_destroy+0x16c/0x170 Code: 41 5c 41 5d e9 a5 04 0b 00 c3 cc cc cc cc 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 40 37 d2 82 48 c7 c7 e8 a0 33 83 e8 4e d7 14 01 <0f> 0b eb a7 41 56 41 89 d6 41 55 49 89 f5 41 54 49 89 fc 55 48 89 RSP: 0000:ffff88800775fea0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff83bdec48 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 1ffff11000eebf9e RDI: ffffed1000eebfc6 RBP: ffff88804362fa00 R08: ffffffff81182e58 R09: ffff88800775fbdf R10: ffffed1000eebf7b R11: 0000000000000001 R12: 000000008c800d00 R13: ffff888005e78040 R14: 0000000000000000 R15: ffff888005cdfad0 FS: 0000000000000000(0000) GS:ffff88807ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000360e001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 It was biscted to commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") The problem is inside free_debug_processing(), under certain circumstances the slab can be removed from the partial list but not freed by discard_slab() and thus n->nr_slabs is not decreased accordingly. During shutdown, this non-zero n->nr_slabs is detected and reported. Specifically, the problem is that there are two checks for detecting a full partial list by comparing n->nr_partial >= s->min_partial where the latter check is affected by remove_partial() decreasing n->nr_partial between the checks. Reoganize the code so there is a single check upfront. Link: https://lore.kernel.org/all/20220930100730.250248-1-feng.tang@intel.com/ Fixes: c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff b731e357 Fri Sep 30 04:07:30 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: fix a slab missed to be freed problem When enable kasan and kfence's in-kernel kunit test with slub_debug on, it caught a problem (in linux-next tree): ------------[ cut here ]------------ kmem_cache_destroy test: Slab cache still has objects when called from test_exit+0x1a/0x30 WARNING: CPU: 3 PID: 240 at mm/slab_common.c:492 kmem_cache_destroy+0x16c/0x170 Modules linked in: CPU: 3 PID: 240 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc7-next-20220929 #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kmem_cache_destroy+0x16c/0x170 Code: 41 5c 41 5d e9 a5 04 0b 00 c3 cc cc cc cc 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 40 37 d2 82 48 c7 c7 e8 a0 33 83 e8 4e d7 14 01 <0f> 0b eb a7 41 56 41 89 d6 41 55 49 89 f5 41 54 49 89 fc 55 48 89 RSP: 0000:ffff88800775fea0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff83bdec48 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 1ffff11000eebf9e RDI: ffffed1000eebfc6 RBP: ffff88804362fa00 R08: ffffffff81182e58 R09: ffff88800775fbdf R10: ffffed1000eebf7b R11: 0000000000000001 R12: 000000008c800d00 R13: ffff888005e78040 R14: 0000000000000000 R15: ffff888005cdfad0 FS: 0000000000000000(0000) GS:ffff88807ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000360e001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 It was biscted to commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") The problem is inside free_debug_processing(), under certain circumstances the slab can be removed from the partial list but not freed by discard_slab() and thus n->nr_slabs is not decreased accordingly. During shutdown, this non-zero n->nr_slabs is detected and reported. Specifically, the problem is that there are two checks for detecting a full partial list by comparing n->nr_partial >= s->min_partial where the latter check is affected by remove_partial() decreasing n->nr_partial between the checks. Reoganize the code so there is a single check upfront. Link: https://lore.kernel.org/all/20220930100730.250248-1-feng.tang@intel.com/ Fixes: c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
/linux-master/include/linux/ | ||
H A D | slab.h | diff f88c3fb8 Tue Mar 12 21:32:19 MDT 2024 Linus Torvalds <torvalds@linux-foundation.org> mm, slab: remove last vestiges of SLAB_MEM_SPREAD Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 3c615294 Fri Jul 14 00:44:22 MDT 2023 GONG, Ruiqi <gongruiqi@huaweicloud.com> Randomized slab caches for kmalloc() When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 4ab5f8ec Mon Jun 12 09:31:45 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7. A series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). This patch (of 17): In preparation for supporting a kmalloc() minimum alignment smaller than the arch DMA alignment, decouple the two definitions. This requires that either the kmalloc() caches are aligned to a (run-time) cache-line size or the DMA API bounces unaligned kmalloc() allocations. Subsequent patches will implement both options. After this patch, ARCH_DMA_MINALIGN is expected to be used in static alignment annotations and defined by an architecture to be the maximum alignment for all supported configurations/SoCs in a single Image. Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to define its value in the arch headers. Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in dma_get_cache_alignment() so that there is no change for architectures not requiring a minimum DMA alignment. Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 4ab5f8ec Mon Jun 12 09:31:45 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7. A series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). This patch (of 17): In preparation for supporting a kmalloc() minimum alignment smaller than the arch DMA alignment, decouple the two definitions. This requires that either the kmalloc() caches are aligned to a (run-time) cache-line size or the DMA API bounces unaligned kmalloc() allocations. Subsequent patches will implement both options. After this patch, ARCH_DMA_MINALIGN is expected to be used in static alignment annotations and defined by an architecture to be the maximum alignment for all supported configurations/SoCs in a single Image. Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to define its value in the arch headers. Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in dma_get_cache_alignment() so that there is no change for architectures not requiring a minimum DMA alignment. Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 4ab5f8ec Mon Jun 12 09:31:45 MDT 2023 Catalin Marinas <catalin.marinas@arm.com> mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7. A series reducing the kmalloc() minimum alignment on arm64 to 8 (from 128). This patch (of 17): In preparation for supporting a kmalloc() minimum alignment smaller than the arch DMA alignment, decouple the two definitions. This requires that either the kmalloc() caches are aligned to a (run-time) cache-line size or the DMA API bounces unaligned kmalloc() allocations. Subsequent patches will implement both options. After this patch, ARCH_DMA_MINALIGN is expected to be used in static alignment annotations and defined by an architecture to be the maximum alignment for all supported configurations/SoCs in a single Image. Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to define its value in the arch headers. Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in dma_get_cache_alignment() so that there is no change for architectures not requiring a minimum DMA alignment. Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 9ed9cac1 Fri Sep 23 14:28:07 MDT 2022 Kees Cook <keescook@chromium.org> slab: Remove __malloc attribute from realloc functions The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed. Thanks to Geert Uytterhoeven <geert@linux-m68k.org> for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 6edf2576 Tue Sep 13 00:54:20 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 6edf2576 Tue Sep 13 00:54:20 MDT 2022 Feng Tang <feng.tang@intel.com> mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 8dfa9d55 Wed Aug 17 04:18:25 MDT 2022 Hyeonggon Yoo <42.hyeyoo@gmail.com> mm/slab_common: move declaration of __ksize() to mm/slab.h __ksize() is only called by KASAN. Remove export symbol and move declaration to mm/slab.h as we don't want to grow its callers. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> diff 8cf9e121 Thu Apr 28 09:59:39 MDT 2022 Andrey Konovalov <andreyknvl@gmail.com> mm: slab: fix comment for ARCH_KMALLOC_MINALIGN The comment next to the ARCH_KMALLOC_MINALIGN definition says that ARCH_KMALLOC_MINALIGN can be defined in arch headers. This is incorrect: it's actually ARCH_DMA_MINALIGN that can be defined there. Fix the comment. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/fe1ca7a25a054b61d1038686d07569416e287e7b.1651161548.git.andreyknvl@google.com |
Completed in 688 milliseconds