Searched hist:3816 (Results 26 - 37 of 37) sorted by relevance
/linux-master/sound/soc/sof/ | ||
H A D | sof-audio.c | diff 3816bbea Thu Mar 17 11:50:42 MDT 2022 Ranjani Sridharan <ranjani.sridharan@linux.intel.com> ASoC: SOF: expose sof_route_setup() This will be used in IPC3-specific code. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20220317175044.1752400-18-ranjani.sridharan@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> |
/linux-master/drivers/tty/serial/ | ||
H A D | xilinx_uartps.c | diff 3816b2f8 Thu Sep 15 03:15:29 MDT 2016 Nava kishore Manne <nava.manne@xilinx.com> serial: xuartps: Adds RXBS register support for zynqmp This patch adds RXBS register access support for zynqmp. To avoid the corner error conditions it will consider only RXBS[2:0] bits while checking the error conditions (Parity,Framing and BRAK). Signed-off-by: Nava kishore Manne <navam@xilinx.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
/linux-master/drivers/scsi/mpt3sas/ | ||
H A D | mpt3sas_base.h | diff ff92b9dd Thu Oct 25 08:03:40 MDT 2018 Suganath Prabu <suganath-prabu.subramani@broadcom.com> scsi: mpt3sas: Update MPI headers to support Aero controllers Updating MPI headers to the latest version 2.6.7 to add support to the driver to detect the new 3816 and 3916 chip based controllers. Separate out firmware image data from mpi2_ioc.h to new file mpi2_image.h Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
/linux-master/mm/damon/ | ||
H A D | core.c | diff 13b2a4b2 Thu Nov 09 22:37:09 MST 2023 Hyeongtak Ji <hyeongtak.ji@sk.com> mm/damon/core.c: avoid unintentional filtering out of schemes The function '__damos_filter_out()' causes DAMON to always filter out schemes whose filter type is anon or memcg if its matching value is set to false. This commit addresses the issue by ensuring that '__damos_filter_out()' no longer applies to filters whose type is 'anon' or 'memcg'. Link: https://lkml.kernel.org/r/1699594629-3816-1-git-send-email-hyeongtak.ji@gmail.com Fixes: ab9bda001b681 ("mm/damon/core: introduce address range type damos filter") Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
/linux-master/include/linux/ | ||
H A D | usb.h | diff 9c6d7788 Fri Aug 26 13:31:32 MDT 2022 Alan Stern <stern@rowland.harvard.edu> USB: core: Prevent nested device-reset calls Automatic kernel fuzzing revealed a recursive locking violation in usb-storage: ============================================ WARNING: possible recursive locking detected 5.18.0 #3 Not tainted -------------------------------------------- kworker/1:3/1205 is trying to acquire lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 but task is already holding lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 ... stack backtrace: CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2988 [inline] check_deadlock kernel/locking/lockdep.c:3031 [inline] validate_chain kernel/locking/lockdep.c:3816 [inline] __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630 __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747 usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109 r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622 usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458 device_remove drivers/base/dd.c:545 [inline] device_remove+0x11f/0x170 drivers/base/dd.c:537 __device_release_driver drivers/base/dd.c:1222 [inline] device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248 usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627 usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118 usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114 This turned out not to be an error in usb-storage but rather a nested device reset attempt. That is, as the rtl8712 driver was being unbound from a composite device in preparation for an unrelated USB reset (that driver does not have pre_reset or post_reset callbacks), its ->remove routine called usb_reset_device() -- thus nesting one reset call within another. Performing a reset as part of disconnect processing is a questionable practice at best. However, the bug report points out that the USB core does not have any protection against nested resets. Adding a reset_in_progress flag and testing it will prevent such errors in the future. Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/ Cc: stable@vger.kernel.org Reported-and-tested-by: Rondreis <linhaoguo86@gmail.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/YwkflDxvg0KWqyZK@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
/linux-master/drivers/gpu/drm/amd/amdgpu/ | ||
H A D | gmc_v9_0.c | diff 3816e42f Tue Jan 09 11:47:37 MST 2018 Christian König <christian.koenig@amd.com> drm/amdgpu: rename pas_id to pasid sed -i "s/pas_id/pasid/g" drivers/gpu/drm/amd/amdgpu/*.c sed -i "s/pas_id/pasid/g" drivers/gpu/drm/amd/amdgpu/*.h Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> |
/linux-master/net/tipc/ | ||
H A D | socket.c | diff f172f4b8 Sun Nov 29 11:32:49 MST 2020 Randy Dunlap <rdunlap@infradead.org> net/tipc: fix socket.c kernel-doc Fix socket.c kernel-doc warnings in preparation for adding to the networking docbook. Also, for rcvbuf_limit(), use bullet notation so that the lines do not run together. ../net/tipc/socket.c:130: warning: Function parameter or member 'cong_links' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'probe_unacked' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_win' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'peer_caps' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'rcv_win' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'group' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'oneway' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'nagle_start' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_backlog' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'msg_acc' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'pkt_cnt' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'expect_ack' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'nodelay' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'group_is_open' not described in 'tipc_sock' ../net/tipc/socket.c:267: warning: Function parameter or member 'sk' not described in 'tsk_advance_rx_queue' ../net/tipc/socket.c:295: warning: Function parameter or member 'sk' not described in 'tsk_rej_rx_queue' ../net/tipc/socket.c:295: warning: Function parameter or member 'error' not described in 'tsk_rej_rx_queue' ../net/tipc/socket.c:894: warning: Function parameter or member 'tsk' not described in 'tipc_send_group_msg' ../net/tipc/socket.c:1187: warning: Function parameter or member 'net' not described in 'tipc_sk_mcast_rcv' ../net/tipc/socket.c:1323: warning: Function parameter or member 'inputq' not described in 'tipc_sk_conn_proto_rcv' ../net/tipc/socket.c:1323: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_conn_proto_rcv' ../net/tipc/socket.c:1885: warning: Function parameter or member 'sock' not described in 'tipc_recvmsg' ../net/tipc/socket.c:1993: warning: Function parameter or member 'sock' not described in 'tipc_recvstream' ../net/tipc/socket.c:2313: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_filter_rcv' ../net/tipc/socket.c:2404: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_enqueue' ../net/tipc/socket.c:2456: warning: Function parameter or member 'net' not described in 'tipc_sk_rcv' ../net/tipc/socket.c:2693: warning: Function parameter or member 'kern' not described in 'tipc_accept' ../net/tipc/socket.c:3816: warning: Excess function parameter 'sysctl_tipc_sk_filter' description in 'tipc_sk_filtering' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
/linux-master/drivers/usb/core/ | ||
H A D | hub.c | diff 9c6d7788 Fri Aug 26 13:31:32 MDT 2022 Alan Stern <stern@rowland.harvard.edu> USB: core: Prevent nested device-reset calls Automatic kernel fuzzing revealed a recursive locking violation in usb-storage: ============================================ WARNING: possible recursive locking detected 5.18.0 #3 Not tainted -------------------------------------------- kworker/1:3/1205 is trying to acquire lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 but task is already holding lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 ... stack backtrace: CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2988 [inline] check_deadlock kernel/locking/lockdep.c:3031 [inline] validate_chain kernel/locking/lockdep.c:3816 [inline] __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630 __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747 usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109 r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622 usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458 device_remove drivers/base/dd.c:545 [inline] device_remove+0x11f/0x170 drivers/base/dd.c:537 __device_release_driver drivers/base/dd.c:1222 [inline] device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248 usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627 usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118 usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114 This turned out not to be an error in usb-storage but rather a nested device reset attempt. That is, as the rtl8712 driver was being unbound from a composite device in preparation for an unrelated USB reset (that driver does not have pre_reset or post_reset callbacks), its ->remove routine called usb_reset_device() -- thus nesting one reset call within another. Performing a reset as part of disconnect processing is a questionable practice at best. However, the bug report points out that the USB core does not have any protection against nested resets. Adding a reset_in_progress flag and testing it will prevent such errors in the future. Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/ Cc: stable@vger.kernel.org Reported-and-tested-by: Rondreis <linhaoguo86@gmail.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/YwkflDxvg0KWqyZK@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
/linux-master/drivers/md/ | ||
H A D | dm.c | diff c7c879ee Thu Oct 21 15:13:25 MDT 2021 Michał Mirosław <mirq-linux@rere.qmqm.pl> dm: make workqueue names device-specific Add device number to kdmflush workqueue name to help debugging CPU usage. Resulting `ps axfu` snippet: root 3791 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:7] root 3792 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd_io/253:7] root 3793 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd/253:7] root 3794 0.0 0.0 0 0 ? S paź19 0:00 \_ [dmcrypt_write/253:7] root 3814 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:8] root 3815 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:9] root 3816 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:10] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com> |
/linux-master/net/mac80211/ | ||
H A D | cfg.c | diff e160ab85 Thu Feb 02 19:36:36 MST 2023 Ping-Ke Shih <pkshih@realtek.com> wifi: mac80211: don't return unset power in ieee80211_get_tx_power() We can get a UBSAN warning if ieee80211_get_tx_power() returns the INT_MIN value mac80211 internally uses for "unset power level". UBSAN: signed-integer-overflow in net/wireless/nl80211.c:3816:5 -2147483648 * 100 cannot be represented in type 'int' CPU: 0 PID: 20433 Comm: insmod Tainted: G WC OE Call Trace: dump_stack+0x74/0x92 ubsan_epilogue+0x9/0x50 handle_overflow+0x8d/0xd0 __ubsan_handle_mul_overflow+0xe/0x10 nl80211_send_iface+0x688/0x6b0 [cfg80211] [...] cfg80211_register_wdev+0x78/0xb0 [cfg80211] cfg80211_netdev_notifier_call+0x200/0x620 [cfg80211] [...] ieee80211_if_add+0x60e/0x8f0 [mac80211] ieee80211_register_hw+0xda5/0x1170 [mac80211] In this case, simply return an error instead, to indicate that no data is available. Cc: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20230203023636.4418-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
/linux-master/mm/ | ||
H A D | internal.h | diff 6bb15450 Fri Dec 28 01:35:41 MST 2018 Mel Gorman <mgorman@techsingularity.net> mm, page_alloc: spread allocations across zones before introducing fragmentation Patch series "Fragmentation avoidance improvements", v5. It has been noted before that fragmentation avoidance (aka anti-fragmentation) is not perfect. Given sufficient time or an adverse workload, memory gets fragmented and the long-term success of high-order allocations degrades. This series defines an adverse workload, a definition of external fragmentation events (including serious) ones and a series that reduces the level of those fragmentation events. The details of the workload and the consequences are described in more detail in the changelogs. However, from patch 1, this is a high-level summary of the adverse workload. The exact details are found in the mmtests implementation. The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch) 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameterr create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed 3. Warm up a number of fio read-only threads accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll fault back in old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup Overall the series reduces external fragmentation causing events by over 94% on 1 and 2 socket machines, which in turn impacts high-order allocation success rates over the long term. There are differences in latencies and high-order allocation success rates. Latencies are a mixed bag as they are vulnerable to exact system state and whether allocations succeeded so they are treated as a secondary metric. Patch 1 uses lower zones if they are populated and have free memory instead of fragmenting a higher zone. It's special cased to handle a Normal->DMA32 fallback with the reasons explained in the changelog. Patch 2-4 boosts watermarks temporarily when an external fragmentation event occurs. kswapd wakes to reclaim a small amount of old memory and then wakes kcompactd on completion to recover the system slightly. This introduces some overhead in the slowpath. The level of boosting can be tuned or disabled depending on the tolerance for fragmentation vs allocation latency. Patch 5 stalls some movable allocation requests to let kswapd from patch 4 make some progress. The duration of the stalls is very low but it is possible to tune the system to avoid fragmentation events if larger stalls can be tolerated. The bulk of the improvement in fragmentation avoidance is from patches 1-4 but patch 5 can deal with a rare corner case and provides the option of tuning a system for THP allocation success rates in exchange for some stalls to control fragmentation. This patch (of 5): The page allocator zone lists are iterated based on the watermarks of each zone which does not take anti-fragmentation into account. On x86, node 0 may have multiple zones while other nodes have one zone. A consequence is that tasks running on node 0 may fragment ZONE_NORMAL even though ZONE_DMA32 has plenty of free memory. This patch special cases the allocator fast path such that it'll try an allocation from a lower local zone before fragmenting a higher zone. In this case, stealing of pageblocks or orders larger than a pageblock are still allowed in the fast path as they are uninteresting from a fragmentation point of view. This was evaluated using a benchmark designed to fragment memory before attempting THP allocations. It's implemented in mmtests as the following configurations configs/config-global-dhp__workload_thpfioscale configs/config-global-dhp__workload_thpfioscale-defrag configs/config-global-dhp__workload_thpfioscale-madvhugepage e.g. from mmtests ./run-mmtests.sh --run-monitor --config configs/config-global-dhp__workload_thpfioscale test-run-1 The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch). 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameter create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed. 3. Warm up a number of fio read-only processes accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll refault old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds. 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup the test files. Note that due to the use of IO and page cache that this benchmark is not suitable for running on large machines where the time to fragment memory may be excessive. Also note that while this is one mix that generates fragmentation that it's not the only mix that generates fragmentation. Differences in workload that are more slab-intensive or whether SLUB is used with high-order pages may yield different results. When the page allocator fragments memory, it records the event using the mm_page_alloc_extfrag ftrace event. If the fallback_order is smaller than a pageblock order (order-9 on 64-bit x86) then it's considered to be an "external fragmentation event" that may cause issues in the future. Hence, the primary metric here is the number of external fragmentation events that occur with order < 9. The secondary metric is allocation latency and huge page allocation success rates but note that differences in latencies and what the success rate also can affect the number of external fragmentation event which is why it's a secondary metric. 1-socket Skylake machine config-global-dhp__workload_thpfioscale XFS (no special madvise) 4 fio threads, 1 THP allocating thread -------------------------------------- 4.20-rc3 extfrag events < order 9: 804694 4.20-rc3+patch: 408912 (49% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-1 662.92 ( 0.00%) 653.58 * 1.41%* Amean fault-huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) Fault latencies are slightly reduced while allocation success rates remain at zero as this configuration does not make any special effort to allocate THP and fio is heavily active at the time and either filling memory or keeping pages resident. However, a 49% reduction of serious fragmentation events reduces the changes of external fragmentation being a problem in the future. Vlastimil asked during review for a breakdown of the allocation types that are falling back. vanilla 3816 MIGRATE_UNMOVABLE 800845 MIGRATE_MOVABLE 33 MIGRATE_UNRECLAIMABLE patch 735 MIGRATE_UNMOVABLE 408135 MIGRATE_MOVABLE 42 MIGRATE_UNRECLAIMABLE The majority of the fallbacks are due to movable allocations and this is consistent for the workload throughout the series so will not be presented again as the primary source of fallbacks are movable allocations. Movable fallbacks are sometimes considered "ok" to fallback because they can be migrated. The problem is that they can fill an unmovable/reclaimable pageblock causing those allocations to fallback later and polluting pageblocks with pages that cannot move. If there is a movable fallback, it is pretty much guaranteed to affect an unmovable/reclaimable pageblock and while it might not be enough to actually cause a unmovable/reclaimable fallback in the future, we cannot know that in advance so the patch takes the only option available to it. Hence, it's important to control them. This point is also consistent throughout the series and will not be repeated. 1-socket Skylake machine global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE) ----------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 291392 4.20-rc3+patch: 191187 (34% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-1 1495.14 ( 0.00%) 1467.55 ( 1.85%) Amean fault-huge-1 1098.48 ( 0.00%) 1127.11 ( -2.61%) thpfioscale Percentage Faults Huge 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-1 78.57 ( 0.00%) 77.64 ( -1.18%) Fragmentation events were reduced quite a bit although this is known to be a little variable. The latencies and allocation success rates are similar but they were already quite high. 2-socket Haswell machine config-global-dhp__workload_thpfioscale XFS (no special madvise) 4 fio threads, 5 THP allocating threads ---------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 215698 4.20-rc3+patch: 200210 (7% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-5 1350.05 ( 0.00%) 1346.45 ( 0.27%) Amean fault-huge-5 4181.01 ( 0.00%) 3418.60 ( 18.24%) 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-5 1.15 ( 0.00%) 0.78 ( -31.88%) The reduction of external fragmentation events is slight and this is partially due to the removal of __GFP_THISNODE in commit ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") as THP allocations can now spill over to remote nodes instead of fragmenting local memory. 2-socket Haswell machine global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE) ----------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 166352 4.20-rc3+patch: 147463 (11% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-5 6138.97 ( 0.00%) 6217.43 ( -1.28%) Amean fault-huge-5 2294.28 ( 0.00%) 3163.33 * -37.88%* thpfioscale Percentage Faults Huge 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-5 96.82 ( 0.00%) 95.14 ( -1.74%) There was a slight reduction in external fragmentation events although the latencies were higher. The allocation success rate is high enough that the system is struggling and there is quite a lot of parallel reclaim and compaction activity. There is also a certain degree of luck on whether processes start on node 0 or not for this patch but the relevance is reduced later in the series. Overall, the patch reduces the number of external fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. Link: http://lkml.kernel.org/r/20181123114528.28802-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
H A D | page_alloc.c | diff 6bb15450 Fri Dec 28 01:35:41 MST 2018 Mel Gorman <mgorman@techsingularity.net> mm, page_alloc: spread allocations across zones before introducing fragmentation Patch series "Fragmentation avoidance improvements", v5. It has been noted before that fragmentation avoidance (aka anti-fragmentation) is not perfect. Given sufficient time or an adverse workload, memory gets fragmented and the long-term success of high-order allocations degrades. This series defines an adverse workload, a definition of external fragmentation events (including serious) ones and a series that reduces the level of those fragmentation events. The details of the workload and the consequences are described in more detail in the changelogs. However, from patch 1, this is a high-level summary of the adverse workload. The exact details are found in the mmtests implementation. The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch) 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameterr create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed 3. Warm up a number of fio read-only threads accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll fault back in old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup Overall the series reduces external fragmentation causing events by over 94% on 1 and 2 socket machines, which in turn impacts high-order allocation success rates over the long term. There are differences in latencies and high-order allocation success rates. Latencies are a mixed bag as they are vulnerable to exact system state and whether allocations succeeded so they are treated as a secondary metric. Patch 1 uses lower zones if they are populated and have free memory instead of fragmenting a higher zone. It's special cased to handle a Normal->DMA32 fallback with the reasons explained in the changelog. Patch 2-4 boosts watermarks temporarily when an external fragmentation event occurs. kswapd wakes to reclaim a small amount of old memory and then wakes kcompactd on completion to recover the system slightly. This introduces some overhead in the slowpath. The level of boosting can be tuned or disabled depending on the tolerance for fragmentation vs allocation latency. Patch 5 stalls some movable allocation requests to let kswapd from patch 4 make some progress. The duration of the stalls is very low but it is possible to tune the system to avoid fragmentation events if larger stalls can be tolerated. The bulk of the improvement in fragmentation avoidance is from patches 1-4 but patch 5 can deal with a rare corner case and provides the option of tuning a system for THP allocation success rates in exchange for some stalls to control fragmentation. This patch (of 5): The page allocator zone lists are iterated based on the watermarks of each zone which does not take anti-fragmentation into account. On x86, node 0 may have multiple zones while other nodes have one zone. A consequence is that tasks running on node 0 may fragment ZONE_NORMAL even though ZONE_DMA32 has plenty of free memory. This patch special cases the allocator fast path such that it'll try an allocation from a lower local zone before fragmenting a higher zone. In this case, stealing of pageblocks or orders larger than a pageblock are still allowed in the fast path as they are uninteresting from a fragmentation point of view. This was evaluated using a benchmark designed to fragment memory before attempting THP allocations. It's implemented in mmtests as the following configurations configs/config-global-dhp__workload_thpfioscale configs/config-global-dhp__workload_thpfioscale-defrag configs/config-global-dhp__workload_thpfioscale-madvhugepage e.g. from mmtests ./run-mmtests.sh --run-monitor --config configs/config-global-dhp__workload_thpfioscale test-run-1 The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch). 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameter create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed. 3. Warm up a number of fio read-only processes accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll refault old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds. 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup the test files. Note that due to the use of IO and page cache that this benchmark is not suitable for running on large machines where the time to fragment memory may be excessive. Also note that while this is one mix that generates fragmentation that it's not the only mix that generates fragmentation. Differences in workload that are more slab-intensive or whether SLUB is used with high-order pages may yield different results. When the page allocator fragments memory, it records the event using the mm_page_alloc_extfrag ftrace event. If the fallback_order is smaller than a pageblock order (order-9 on 64-bit x86) then it's considered to be an "external fragmentation event" that may cause issues in the future. Hence, the primary metric here is the number of external fragmentation events that occur with order < 9. The secondary metric is allocation latency and huge page allocation success rates but note that differences in latencies and what the success rate also can affect the number of external fragmentation event which is why it's a secondary metric. 1-socket Skylake machine config-global-dhp__workload_thpfioscale XFS (no special madvise) 4 fio threads, 1 THP allocating thread -------------------------------------- 4.20-rc3 extfrag events < order 9: 804694 4.20-rc3+patch: 408912 (49% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-1 662.92 ( 0.00%) 653.58 * 1.41%* Amean fault-huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) Fault latencies are slightly reduced while allocation success rates remain at zero as this configuration does not make any special effort to allocate THP and fio is heavily active at the time and either filling memory or keeping pages resident. However, a 49% reduction of serious fragmentation events reduces the changes of external fragmentation being a problem in the future. Vlastimil asked during review for a breakdown of the allocation types that are falling back. vanilla 3816 MIGRATE_UNMOVABLE 800845 MIGRATE_MOVABLE 33 MIGRATE_UNRECLAIMABLE patch 735 MIGRATE_UNMOVABLE 408135 MIGRATE_MOVABLE 42 MIGRATE_UNRECLAIMABLE The majority of the fallbacks are due to movable allocations and this is consistent for the workload throughout the series so will not be presented again as the primary source of fallbacks are movable allocations. Movable fallbacks are sometimes considered "ok" to fallback because they can be migrated. The problem is that they can fill an unmovable/reclaimable pageblock causing those allocations to fallback later and polluting pageblocks with pages that cannot move. If there is a movable fallback, it is pretty much guaranteed to affect an unmovable/reclaimable pageblock and while it might not be enough to actually cause a unmovable/reclaimable fallback in the future, we cannot know that in advance so the patch takes the only option available to it. Hence, it's important to control them. This point is also consistent throughout the series and will not be repeated. 1-socket Skylake machine global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE) ----------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 291392 4.20-rc3+patch: 191187 (34% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-1 1495.14 ( 0.00%) 1467.55 ( 1.85%) Amean fault-huge-1 1098.48 ( 0.00%) 1127.11 ( -2.61%) thpfioscale Percentage Faults Huge 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-1 78.57 ( 0.00%) 77.64 ( -1.18%) Fragmentation events were reduced quite a bit although this is known to be a little variable. The latencies and allocation success rates are similar but they were already quite high. 2-socket Haswell machine config-global-dhp__workload_thpfioscale XFS (no special madvise) 4 fio threads, 5 THP allocating threads ---------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 215698 4.20-rc3+patch: 200210 (7% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-5 1350.05 ( 0.00%) 1346.45 ( 0.27%) Amean fault-huge-5 4181.01 ( 0.00%) 3418.60 ( 18.24%) 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-5 1.15 ( 0.00%) 0.78 ( -31.88%) The reduction of external fragmentation events is slight and this is partially due to the removal of __GFP_THISNODE in commit ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") as THP allocations can now spill over to remote nodes instead of fragmenting local memory. 2-socket Haswell machine global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE) ----------------------------------------------------------------- 4.20-rc3 extfrag events < order 9: 166352 4.20-rc3+patch: 147463 (11% reduction) thpfioscale Fault Latencies 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Amean fault-base-5 6138.97 ( 0.00%) 6217.43 ( -1.28%) Amean fault-huge-5 2294.28 ( 0.00%) 3163.33 * -37.88%* thpfioscale Percentage Faults Huge 4.20.0-rc3 4.20.0-rc3 vanilla lowzone-v5r8 Percentage huge-5 96.82 ( 0.00%) 95.14 ( -1.74%) There was a slight reduction in external fragmentation events although the latencies were higher. The allocation success rate is high enough that the system is struggling and there is quite a lot of parallel reclaim and compaction activity. There is also a certain degree of luck on whether processes start on node 0 or not for this patch but the relevance is reduced later in the series. Overall, the patch reduces the number of external fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. Link: http://lkml.kernel.org/r/20181123114528.28802-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
Completed in 2186 milliseconds